Running large queries once, then re-using the output as input for other SQL statements...?

Hi,

I’m still very new to Grafana. This is my second question in as many weeks.

I use SQL Server as my primary data source.

I’m creating a dashboard with many visualizations, and I’m finding a lot of the visualizations are re-using the same SQL query or two (or three), and then I apply further tweaks via Transform operations.

Maybe I’m doing this wrong, but what I’ve found works okay for me is to define visualizations (tables) in a row that I keep collapsed; call them “QueryA”, “QueryB”, “QueryC”, etc. Each one of those could, in theory, be relatively expensive to perform, so I don’t want to re-invoke them all the time. In other rows, I define visualizations that use Dashboard as their Data source (not SQL), and specify, for example, QueryB as its data source. Then I apply transformations to tweak the output of the SQL table returned and turn it into, say, a Stat panel or a piechart. Another visualization might be defined in the same way (referring to QueryB as its data source) and with its own transformation, which is different from the other visualization.

My understanding is that this should be a good way to do this, as QueryB is hitting my SQL machine only once, and the same results are re-used for both visualizations (differing mainly in the Transform steps).

The problem is that I’m encountering a lot of instances where I just cannot quite get the output I want through transformations. I’m much more familiar with SQL, so what I find myself doing is to repeat QueryB and but tweak the SQL statement “just so”, so I obtain the results I want as a SQL table, and then I can display it straight in a visualization without having to introduce Transform steps. It works, but I can’t help but think this is inefficient as it needs to hit my server with very similar (but not identical) queries.

What I’m hoping for is a way to define my large and complex query (“QueryB”) only once (shared by the visualizations that need it), and then, for any visualization that needs it, apply a separate SQL query against the output of QueryB, rather than fussing about with Transform operations.

I suppose I could even have my expensive queries (QueryA, QueryB, QueryC) each write their output to a SQL temp table (#tempA, #tempB, #tempC) and then have my visualizations use a SQL data source querying those already-populated temp tables and adding the SQL tweaks to produce the final table desired (again avoiding transform steps). But I’m not certain how I could ensure that #tempA, #tempB and #tempC are all created before I get any visualization rendered. Typically I’d want to ensure the queries that create those temp tables are run as soon as a dashboard is loaded, but before any visualization tries to run the SQL queries that will use the temp tables.

Am I making sense? Is there something such as a blog article that someone’s already written that accomplishes something similar to what I’m trying to do, and goes over the steps one at a time?

I’m almost certain this (or something similar) is being done all the time; part of my problem when searching and googling at large might be my lack of familiarity with the terminology used. Is a variable what I’m after? A template? Something else?

1 Like

I think you need “materialized view”. Check doc of your SQL server if it is supported and how it can be used.

Caching (on the datasource level) on the Grafana side may also help to improve performance (it depends how your queries look like):

can you share your queries?

also remember each panel has it’s own datasource and each panel will hit the sql server.

in most cases, most issues with performance posted in this forum end up being issues on the query itself, the design of tables, lack of indices especially on columns used to filter things, use of raw sql instead of persisted views (as @jangaraj rightly pointed out) or stored procedures, querying from grafana against oltp backends (heavy usage by other apps in the ecosystem) etc.

So showing us your queries, table schemas, indices etc might help resolve your performance issue

Well, my data source is a standard Microsoft System Center Operations Manager (SCOM) database. As such, I’m not at liberty to modify its tables or indices, nor could I assume anyone using my dashboard(s) would be able to do the same. SCOM users might be able to run SQL queries directly off the DB, but its schema is essentially hands-off.

To boil things down, most of my dashboards revolve around 3 things: rules and monitors (which can raise alerts), computers, and alerts. I have 3 separate queries that will each extract the subset of relevant rules/monitors, computers and alerts into 3 separate temp tables. Assume the queries needed to build those temp tables are relatively expensive, but requerying those temp tables once they exist (which is all that each visualization needs to do–JOINing, adding more specific WHERE clauses, performing calculations, etc) is much more lightweight, as the data that was expensive to retrieve is already in the temp tables.

So, ideally, I’m looking for something that’ll guarantee to have the queries that create the temp tables run first when my dashboard is loaded (and only once, if possible). Following that, all individual visualizations need to do is use SQL as my data source, but they’ll be running relatively simple queries targeted against the temp tables only.

At some point it would be nice to clear out the temp tables, but that’s another matter. The 3 queries I have, that each is responsible for creating a temp, starts by deleting it if it already exists…

gotcha. so at the end of the day basically your issue is not grafana it is your datasource.

Temporary tables are private to the process that created it. So you will hit the bottle neck again and again as one process ends and you will have to recreate them

I personally would go with a mini datawarehouse populated on some schedule via ssis or powershell or python. this is what we had to do for our SCCM sql server backend, it was just way too big to wrangle

We also went for an approach where we had another db that synched with it as well. but the flattened data approach saved our bacon

Temporary tables are private to the process that created it. So you will hit the bottle neck again and again as one process ends and you will have to recreate them

I’m actually okay with that (I suppose it depends on when said process starts/ends). If I can get the temp tables created when the dashboard is loaded - and then have each visual query the temp tables - I’m okay with that. What I don’t want is to have each visual on a dashboard - I could have a dozen - re-run the large and expensive queries, where a visual only differs from another by virtue of a slightly different where clause or in this case I need to join it with something else, etc, depending on what I’m trying to get into the visual.

I came up with the idea of using temp tables only because I could not figure out how to “chain” queries in Grafana, where the first query only needs to be done once, and the second one (which is different for each visual) should run against the results of the first query. If that sort of thing is supported (rather than using temp tables), I’m still thinking this is what I need to investigate. Temp tables are only a means to an end, and I’m open to alternatives.

1 Like

flattened tables populated on a schedule. again it is not a grafana issue. better off asking in following awesome forum https://forums.sqlteam.com/