What Grafana version and what operating system are you using?
I am using gafana 7.0.0 on RHEL 7.3 linux.
What are you trying to achieve?
I am running a query against influxdb for 6 months worth of data.
How are you trying to achieve it?
I have influxdb connected by using the datasources.
What happened?
The 24 hour timeframe works find but if I query 7 days worth then I get out of memory error. I also have chronograf run the same query on influxdb and I do not get any error and as a matter of fact it is much faster.
What did you expect to happen?
We increased the memory from 16 to 32 gb and from 6 processors to 12 but there is no change in the behavior.
Can you copy/paste the configuration(s) that you are having problems with?
Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
I turned debug on in hopes of catching some issue but the logs were not very helpful. I did not get any error in the Grafana UI except out of memory error.
Did you follow any online instructions? If so, what is the URL?
No I did not follow any url. I did look in the Github repo issue list but did not find something that I could use to tweak the configuration.
Can someone suggest what I should do as next steps?
Interesting. Can you point out exactly where the error occurs? Do you see any errors at all in the Grafana logs? Does the grafana server process actually crash due to OOM? Or is it just something that you see in the front end? If itâs the latter, are there any more details in your browserâs console or network request logs?
In my experience even a modestly sized Grafana server (4GB RAM) shouldnât have memory issues with heavy queries, so what you describe sounds peculiar.
Finally, and more generally, I guess youâre not applying any aggregation over time in your query? Itâs worth considering whether you actually need to query the raw data; if youâre simply plotting the data, then applying an aggregation may be sensible. Sorry if I have the wrong end of the stick here (I donât know your use case), but just a thought.
Thanks for your reply @svetb. The browser reports OOM. Yes it is the raw data and you got that right :). I have not started aggregating it yet. It is in consideration. When I use Chronograf which is part of tick stack the results are in seconds and it does not time out or throw any errors. When Grafana reports OOM I do not see anything in the logs to report OOM. I am also running top in another terminal and there is plenty memory to use. The Grafana, Influxdb / Chronograf are on the same server. We just increased the memory to 32 gb yesterday night from 16 and has not made any difference. I had the SA check the system logs and no OOM in /var/log/messages etc. I have not made any changes in the config.ini for grafana which could cause any issues. If I do a 24 hours on the dashboards, I see no issues and the dashboard shows up at acceptable speed. When I do a 7 days graph I see the OOM on the browser. I remember I did a browser reset (Edge) but no change. Chronograf performs regardless. Raw stats are collected once a minute. Also boss likes Grafana which looks like Graphite .
Right, so your browser is running out of memory due to the volume of data being thrown at it. As you noticed, upgrading your server wonât make a difference.
There may be various workarounds and tweaks you can try, but adding an aggregation is definitely the best solution. Thereâs basically little point in querying - and feeding your browser - millions (?) of data points if all you really need is a chart on a screen with a resolution of ~2K pixels.
Happy to try and point you in the right direction if youâre not sure how to approach that.
Yes. I definitely appreciate your help to point me in the right direction. I still cannot understand one thing is that why grafana runs out of memory when chronograf does not. Infact I can run larger datasets in less than 30 seconds on chronograf. I am using the same browser Edge the new one. If I can get my mind to understand that then I would look at how to do some work arounds and I cannot find any reason. Is it because of the resolution?
Are you sure youâre running exactly the same query in Grafana and Chronograf? Chronograf does do aggregation by default, unless you manually write a query that does not have it. In fact, thatâs the default behavior in Grafana also. So itâs a bit hard to give you a good diagnostic without seeing the specific query/queries.
Either way, even if youâre running the same query, itâs possible that the Chronograf front end happens to be good at handling a payload with millions of points - while Grafanaâs isnât. Grafana does provide far more complex functionality for post-query data manipulation (i.e. in the front end), so itâs possible that this causes it to be less good at handling massive payloads. I donât knowâŚeven though Grafana and Chronograf look kind of the same, theyâre very different tools - so I donât find it quite as surprising that their behavior might diverge when faced with an edge case.
I went ahead and installed apache to act as a proxy and it did a little better but obviously not good enough. So can you guide me how to do an aggregation? Is that using the Telegraf plugin?
Thanks for your help. Here are some queries from different dashboards:
Grafana:
SELECT âvalueâ FROM âstat.avedurâ WHERE $timeFilter GROUP BY âhostâ
SELECT âvalueâ FROM âstat-amqp-store-step.countâ WHERE $timeFilter GROUP BY âhostâ
SELECT âusage_idleâ * -1 + 100 FROM âautogenâ.âcpuâ WHERE (âcpuâ = âcpu-totalâ AND âtagâ = âtotalcpuâ AND âhostâ = âservername1.example.comâ OR âhostâ = âservername2.example.comâ) AND $timeFilter GROUP BY âhostâ
Chronograf:
SELECT mean(âvalueâ) AS âmean_valueâ FROM âDB_Nameâ.âautogenâ.âstat.media-read-decrypt.avedurâ WHERE time > :dashboardTime: AND time < :upperDashboardTime: GROUP BY time(:interval:), âhostâ FILL(null)
SELECT mean(âvalueâ) AS âmean_valueâ FROM âDB_Nameâ.âautogenâ.âstat.countâ WHERE time > :dashboardTime: AND time < :upperDashboardTime: GROUP BY time(:interval:), âhostâ FILL(null)
SELECT "value" FROM "stat.avedur" WHERE $timeFilter GROUP BY "host"
with a time aggregation is
SELECT mean("value") FROM "stat.avedur" WHERE $timeFilter GROUP BY time($__interval), "host"
You can also add a FILL(null) clause at the end, like in the Chronograf queries; I donât remember if thatâs really necessary or just a nice-to-have.
Thanks very much @svetb. I changed the dashboard which had like 75 graphs and once I use the aggregation they loaded in a flash even for a 90 day timeframe. I am really appreciate your guidance.