Calculation on data from multiple datasources

I’m having some issues doing a calculation which combines data from multiple databases, maybe someone here can help me out :slight_smile:

  • What Grafana version and what operating system are you using?
    Grafana Cloud (steady)

  • What are you trying to achieve?
    I have a ClickHouse database with usage statistics and a PostgreSQL database with user data. I want to calculate the percentage of users that have a certain amount of usage and display it in a stat panel.

  • How are you trying to achieve it?
    To link usage stats to individual users, I need to join the query results from both datasources. Without this I can’t get an accurate number of “active users”. This number I then want to use in a calculation with the total number of users to produce said percentage. At this step I no longer have access to the total number of users due to the join between the tables.

  • What happened?
    By joining the tables to figure out which users are active, using the Join by field transformation, I lose any other queried information, e.g. the full list of users.

  • What did you expect to happen?
    I would expect some way to join two tables, reduce that to a single number, then calculate a new one using that result together with the result from the original, or a completely other, query. Some way of doing things in the order query > query > transformation > query > transformation would probably solve my problem but atm I can’t seem to figure out any way to do that.

  • Can you copy/paste the configuration(s) that you are having problems with?
    Not really applicable, ask for more info and I’ll share it.

  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
    No.

  • Did you follow any online instructions? If so, what is the URL?
    Didn’t find any on this particular issue.

can you supply a simplified table view of the two sets of data?

Certainly. The queries and tables are quite simple really, I’m more having an issue of how to combine the results into the panel I want.

User data is a table of API keys and user ID:s (result of a query to a PostgreSQL DB) and the usage stats are per key and with a usage number (result of a query to a ClickHouse DB):


Since multiple keys can belong to the same user my thinking is to join these two on API keys and then use the Reduce transformation with Distinct Count to gain the unique number of users (with a certain usage level, defined in the CH DB query). After that, I somehow need to get access to the full number of users to be able to calculate a %. At least, that was what I was hoping to somehow achieve.

you are going to get duplicated data as your count is on api_key not username:

image

you are going to get duplicated data as your count is on api_key not username

Not sure what you mean by that, what data is duplicated?

The count is per field → extract the number of unique users that have API keys present in the fetched data. This is the transformation page:

Note, the result from this transformation is the number I want to divide by the total number of users. At this point - can I in some way get data from my PostgreSQL DB “again”?

These paths haven’t worked out so far:

  1. If I just add another query for the number I need, it doesn’t survive the JOIN.
  2. I haven’t found a transformation that can do what I want, e.g. “preserve” data from an earlier stage, or make another query after a transformation stage is complete.

see in my snip, api key1 was used 111 times, but because it is a total, it shows 111 by user1 and 111 by user2, because there is no link between api and users and count, only api and users, and api and count.

you can try use query variables, to run the queries you want and then use those variables in another query? so kind of a query based on results of another query, but you cant get data, process it and then fetch again based on the processed data…

I’ll try using query variables, though it feels like that could get messy if I need to create new variables for every other panel in my dashboards. But maybe it’s the only way.

but you cant get data, process it and then fetch again based on the processed data…

It’s not strictly this I need, more like I want to get two sets of data, apply some transformations to the first set and then other transformations to the result of that + the other set. E.g. process some data, then include the rest of the data.

Another way would be if one could select only certain series when joining, e.g. join queries A + B, but leave C independent. Can’t find that as a feature though.

Just rambling on here. Thanks for your help btw, much appreciated!

grafana is best at visualizing, leave it to do that and don’t burden it with messy stuff. when things start to get messy then it is not maintainable.

so on your click house db example you have a column called sum(requests)? is that really a real column name?

it would be best if you provided real DDL rather than images

ie

create table users(api_keys guid, user_id guid)

create table apis(api_keys  guid, ???)

then we can emulate on our side the db

grafana is best at visualizing, leave it to do that and don’t burden it with messy stuff. when things start to get messy then it is not maintainable.

Yeah, I’d prefer that! Just need some way to combine the data from these tables. I’m starting to think maybe I should create a small service which does what I want and puts it into another table. Unless there actually is a way in Grafana.

The column is “requests”, and I’m aggregating its contents using sum(). Pasting DDL examples below.

Data DB (ClickHouse)

CREATE TABLE analytics (api_key String, requests Int64, ...)

User DB (PostgreSQL).

CREATE TABLE api_keys (api_key String, user_id UUID, ...)

If it’s also helpful, the queries look like this (some non-relevant conditions stripped):

SELECT 
    api_key as api_keys,
    sum(requests)
FROM 
    analytics
GROUP BY
    api_key
HAVING 
    sum(requests) > 0
SELECT
    api_key as api_keys,
    user_id
FROM
    api_keys

So they are joined to enable me to count distinct user_id:s based on conditions placed on the number of requests in the analytics table.