Count number of unique values for a key

Hi there,

I have log lines that look something like this:

{job="whatever",environment="test"} organization=one, user=A
{job="whatever",environment="test"} organization=one, user=A
{job="whatever",environment="test"} organization=two, user=Z
{job="whatever",environment="test"} organization=one, user=A
{job="whatever",environment="test"} organization=two, user=Z
{job="whatever",environment="test"} organization=one, user=B
{job="whatever",environment="test"} organization=one, user=B
{job="whatever",environment="test"} organization=one, user=B

What I need is the number of unique users per organization (and day). Is that even possible in LogQL? Query languages are definitely not my strength, but sadly I see absolutely no way of achieving this.

Thanks in advance.

yes this should be possible, I think something like this might be what you want:

sum by (organization, user) (count_over_time({job="whatever",environment="test"} | logfmt [1d]))

The catch here is I cheated a little, the message examples you provided are not logfmt so this would fail (logfmt does not use a comma delimter)

So instead you would likely have to write a regex to do this:

sum by (organization, user) (count_over_time({job="whatever",environment="test"} | regexp ".*organization=(?P<organization>.*), user=(?P<user>.*).*"  [1d]))

I’m not sure that’s totally right but should help get you started.

If it all possible to change your log line to be actual logfmt or json you will have a much easier time of this though, those parsers are faster and don’t require writing regex :slight_smile:

@ewelch I have very similar requirement as the OP but I wanted to see result like

Line #1 (for organization one) showing 2 as the result (because unique users are A and B)
Line #2 (for organization two) to show 1 (because only Z is there, though appearing two times)

The query in your example seems to yield for, each unique org and user pair, the number of times it appears in the log?

Didn’t get to look into this in a while, sorry. But yea, @ewelch’s query doesn’t yield unique users per organization, and I’m not sure that’s possible with LogQL at all.

Seems to be an

sum by (organization) (
  sum by (organization, user) (
    count_over_time(
      {job="whatever",environment="test"}
      | regexp ".*organization=(?P<organization>.*), user=(?P<user>.*).*"
      [1d]
    )
  ) ^ 0
)

pay attention at ^ 0

2 Likes

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.