Regex works fine with grafana 3.2 but works real slow with grafana 4.6

influx 1.4 grafana 4.x (any version 4 i tried )
I ran into this problem with 1.2.2 and 4.x of grafana which didn’t work either.
works fine grafana 3.2 and influx 1.4

I’m trying to upgrade to 4.6

I’ve had this problem for a while and have been trying to figure out a solution. I’m passing a large number of guids to a regex expression, and it is taking a long time for the query to run 7 to 8 minutes. I see that it makes the influxdb cpu spike for the amount of time it is working on it. So, it’s not waiting for anything. This works fine grafana 3.2 and influx 1.4. I’m running this my production. I can’t see the difference when grafana makes the call between the versions. Doing a “show queries” in the influx the query looks a lot like this 10 guid query excepted it has 3000 guids

(When I do a show queries from influx I see this query)

SELECT sum(success) / sum(total) * 100.000 AS success FROM rates WHERE id =~ /(3897f423-a4c1-4db2-b597-b2aee9485857|f3470a5f-20e6-496b-a490-de4541796748|2578a079-3dc6-448f-93e0-18915f429566|827f3a06-c9a4-4355-93a9-2202798441d5|fg013f23-62a8-4e10-94cd-f49ed3364b4d|530aed30-a58e-4b1f-bffa-7e06e885a55c|c908bad0-527b-483b-83f2-c18ee5179318|67b5cf45-b205-4295-820c-eba3515e713d|g096c2c0-b4c0-4db1-8304-dcea87c6ff99|25497141-5af2-4f0d-89b5-bb83b003a8f0)/ AND time > now() - 7d GROUP BY time(1d)

Two reasons I have to do the regex. 1) I need to be able to do grouping on the data after the collection so I can’t do it with tags. 2) There is no “in” clause in influx SQL.

The idFilter is generated by a SimpleJsondatasource I’m sure that this is not causing the issue because I see the query in the influx log, it looks correct, and it get’s to influx pretty quick when the dashboard is loading.

Here is the grafana query
SELECT sum(success) / sum(total) * 100.000 AS success FROM rates WHERE id =~ /$idFilter/ AND time > now() - 7d GROUP BY time(1d)

Any idea how to figure out what’s wrong with this?

Thanks

Strange , so the query grafana sends is the same but slower in 4.6 ? That makes no sense

I believe it is. I will spend some time to double check the query. It gets printed in the influx log. Unfortunately, this is running in our production and none of our test environments have the same amount of data or run with horsepower. I have a limited time to upgrade in production, test, then restore to the 3.2. I also posted a similar item in the influx forum last June with no resolution.

Is there any logging I can turn on in grafana that may shed more light? The datasource is set up as proxied.

Thanks for your help

Interesting, I ran with 3.2 and 4.6 and found the query in each influx log. I cut out the queries and it thought this script

function timer()
{
if [[ $# -eq 0 ]]; then
echo $(date ‘+%s’)
else
local stime=$1
etime=$(date ‘+%s’)

    if [[ -z "$stime" ]]; then stime=$etime; fi

    dt=$((etime - stime))
    ds=$((dt % 60))
    dm=$(((dt / 60) % 60))
    dh=$((dt / 3600))
    printf '%d:%02d:%02d' $dh $dm $ds
fi

}

tmr=$(timer)
outfile=out.txt
sql=$(cat query.txt)
influx -database mydb -format csv -execute “$sql” > $outfile

the 3.2 completed in 2 seconds
the 4.6 completed in 2:08 (two minutes 8 seconds)

looking at the queries they appear to be identical.

As I look for differences I noticed that the 3.2 quids are sorted in ascending order. The 4.6 are not. That looks like the only difference. I’m going to write a tool to sort the 4.6 queries then re-summit the it with the quids in sorted order.

Yep, that is it.

in 3.2 the list is sorted and it completes fast
in 4.6 the list is unsorted and it takes a over 2 minutes to complete.

I sorted the list and re-submitted it and it worked in 1 second

I have a workaround now. I can move forward with 4.6. I’ll update the influx forum with my finding.