Hi,
I tried to create a Loki NGINX dashboard, similar to this one Grafana, but I noticed some troubles with some panel and their query results.
First of all, I removed relative time in total request panel and this is what I get:
27400 total requests.
If I see the panel on the right I have this data
If I have 27400 total requests how can be possible that I also have 28600 requests with HTTP status code 200?
I checked both panels it seems to have same settings, without relative time and they both refer to last 6 hours (selected from time picker in the right upper corner).
Finally, there is also a problem with percentage.
I change % of 5xx requests query to count http code 301 requests.
As you can see from previous screens, the right percentage should be 2,36% but the result I get is
I have similar issues also in my dashboard (I used same queries that you can find in demo linked at the begin of this message) and I don’t know how to fix it.
Everything seem to be right, but I don’t know why these results are wrong.
Can you help me solving these issues? Is this a bug?
I use Loki to collect logs.
Thanks in advance
Rather hard to troubleshoot without access to your dashboard. I’d try a couple of things:
-
Run the identical query with identical time frame and interval in Grafana Explore (and perhaps the API endpoint for good measure), and compare results.
-
Share your dashboard JSON if possible, and perhaps your query as well.
Hi Tony,
I tried to run this query in Explore (I use pattern version, it’s quite identical to JSON version, I linked JSON version dashboard because pattern one is not working on play.grafana.org)
sum by(status) (count_over_time({filename=“/path/to/log/filename”} | pattern <remote_addr> - - <ts> "<method> <path> <_>" <status> <body_bytes_sent> "<_>" "<user_agent>" <response_time>
| error=`` [$__interval]))
and what I get are 7 http requests with status code 301 and 31 http requests with status code 200 (38 requests totally)
I use the same query for a panel in my dashboard and it says 7 http requests with status code 301 and 37 http requests with status code 200: I use the same interval and time frame I don’t know why panel shows 7 more http request with status code 200.
Query panel is
sum by (status) (count_over_time({} | pattern <remote_addr> - - <ts> "<method> <path> <_>" <status> <body_bytes_sent> "<_>" "<user_agent>" <response_time>
| status != “” and status != "-"
| error=“” [$__interval]))
I also use this query for total requests
*sum by(filename) (count_over_time({}[$__interval])) * and the result showed is inconsistent with the previous panel.
For example, http status code panel shows 37 http response with status code 200 and 7 http response with status code 301 (44 total requests) and panel for total requests shows 42.
Right now I can’t share my dashboard
Please correct me if I am wrong, but are you saying that you are aggregating by status for the individual status panel, and arregating by filename for the total?
Why not simply sum without any aggregation for the total?
I aggregate total because I used queries found in demo dashboard on play.grafana.com
However, I changed query for total requests to sum (count_over_time({}[$__interval])) , without aggregating but there is no difference in result: with or without aggregating the number of total requests is the same