Filtering option for Elasticsearch datasource

Hi Team,
We had designed a dashboard in grafana with Elasticsearch as datasource. In metricbeat agent, a new field is added via metricbeat.yml with below config

fields:
** application: [“exxS-e11”,“eBxxxxH-e11”,“exxS-e10”]**
fields_under_root: true

So in kibana, the app info is displayed as below

In grafana, we had created a variable to list applications for filtering,
image

while we filter any one application to get unique count, it by default includes other 2 apps count and display value as “3”. This may be due to cardinality aggregation feature in ES, But we wanted to filter and display value as “1”. is it feasible?

Yeah you can do this. Like you have created variable with the name of application. Put this variable in your query in where clause like
where application in $Variablename ( Name which you have assign to the variable). I wish this will will be helpful.

Thanks.But this is for Elasticsearch as source and use lucene query in grafana. When we filter one application, it shows unique count as 3 (including other 2 in array) as shown below.

I have a plan to work on Elastic search for log files but right now i didn’t use it. Anyways thanks for update. when you’ll find out the solution let me know . This will be helpful for me. Thanks.

1 Like

Hi @karthick2020,

is not 3 the expected result? The screenshot in your original post shows 3 unique values for application.

I think you should use Count if you want to graph the occurrence of a specific $APPLICATION.

I don’t use Elasticsearch much as a Data Source so I am not 100% sure though…

Hi @b0b,
Yes, we expect the unique count result as “1” since we apply filter and try to search for a specific application from array of strings.

As far as I know, Lucene/Elasticsearch does not work quite like that.

I’m sure you will see the same result if you query Elasticsearch directly like this

You will search the application field for $APPLICATION and the result will be how many unique values for $APPLICATION were found in all results combined that matched application:$APPLICATION.

The result is application:[“exxS-e11”,“eBxxxxH-e11”,“exxS-e10”] which is evaluated as 3.

That is exactly what I would expect Grafana to return for that query.

Hi @b0b,
Thanks, but our team expect to see response value as “1”. Is there any alternate options that I can use to get this exact unique count (ignoring combination) as “1” at elasticsearch level or in Grafana?

It is mainly Elasticsearch but also Unique Count in Grafana does not do what you expect it to do.

Can you change application from an array to a hash?

Instead of this

application: [“exxS-e11”,“eBxxxxH-e11”,“exxS-e10”]

You would have

application:
  exxS-e11: true
  eBxxxxH-e11: true
  exxS-e10: true

The you could query with _exists_:application.$APPLICATION

Elastic forums have other suggestions

Hi @b0b,
Thanks for the suggestions.

  • I will check on using Hash (also need to check its flexibility with automated deployment scenarios)
  • From two ES forum references which you shared, I could try 1st reference and check if below option may be helpful. But it may be challenging to handle via grafana template variable filter. Also we end up ensuring that our filter or search always match expected response structure.

Configure:
PUT metricbeat/doc/1
{
“application”: [
“exxS-E11”,
“ebxxxh-E10”
],
“required_matches”: 2
}

Search as:
GET metricbeat/doc/_search
{
“query”: {
“bool”:{
“must”: [{
“terms_set”: {
“application” : {
“terms” : [“exxS-E11”,“ebxxxh-E10”],
“minimum_should_match_field”: “required_matches”
}
}
}], “filter”:[{
“range”: { “required_matches”: { “gte”: “2” }}
}]
}
}
}’

  • 2nd ES forum reference may consume more resources and lead to performance issues.

Maybe adding metadata from a process is easier than adding fields directly which might not be very dynamic…

https://www.elastic.co/guide/en/beats/metricbeat/current/add-process-metadata.html

I use metadata fields in Logstash and they can be dynamically updated and changed. Maybe that can work for Metricbeat as well.

Thanks @b0b. Initially I was planning to use this process metadata. But refrained and used fields for unique tracking of certain additional details like application etc. What would be the major difference between metadata & fields? Could you please clarify? or please share reference to understand the difference in better way.

Not sure how well this works for metricbeat… Metadata is mostly internal only to the service in question.

When I use metadata fields in Logstash I can set them on different inputs for different kinds of logs. Then I use the metadata to set to which Elasticsearch index the messages will be routed. My Logstash output looks like this

output {
  elasticsearch {
        hosts => ["10.0.0.1:9200"]
        index => "%{[@metadata][log_prefix]}-%{[@metadata][index]}-%{+yyyy.MM.dd}"
  }
}

This is just to illustrate how I use them. With this I do not need to use conditionals based on field values to route certain logs to certain indices…

I was imagining that maybe this would be possible in metricbeat

  fields:
   application: %{[@metadata][application_id]}

Something like that but I can not find if it is possible in metricbeat or not… And how you would assign the value for the metadata field…

Which is why I suggested Add process metadata as that is something that is documented. Or if you are running containers there is also Add Docker metadata.

That is about as much as I know on this topic.

Hope that helps.

Hi @b0b,
Thanks. I tried suggested hash, but with the combination of array (to handle template variable) as referred below

fields:
application:
exxxx11: true
eBxxxxx11: true
application1: [exxxx11,eBxxxx11]
fields_under_root: true

application - for Hash
application1 - array retained to handle “template variable” filtering only
But it is not helpful while I try to get count dynamically with filter option (from array based field
only for template variable usage) as referred below

Hi @b0b, could you please check on Hash based approach which I shared above and clarify

I’m doing some tests @karthick2020,

I have not worked much with Elasticsearch as a Data Source…

This is my test data

{"doc_nr": 1,"app1":true,"applications":["app1"]}
{"doc_nr": 2,"app2":true,"applications":["app2"]}
{"doc_nr": 3,"app3":true,"applications":["app3"]}
{"doc_nr": 4,"app1":true,"app2":true,"applications":["app1","app2"]}
{"doc_nr": 5,"app3":true,"app2":true,"applications":["app3","app2"]}
{"doc_nr": 6,"app3":true,"app1":true,"app2":true,"applications":["app3","app2","app1"]}

I did not get the grouping to work the way I wanted… I haven’t used it before like this…

It is also not possible to choose several apps the way I did it…

Thanks @b0b. Grouping which are referring is about choosing/selecting multiple apps from template variable and getting the exact unique count based on our multiple selection?

Hi @karthick2020,

I should have written “Group by” instead of grouping :slight_smile:

The third row in the query editor. With a short interval when I used “Group by” Date Histogram I got a float instead of 0 or 1. I guess it was the average over the time range when split into interval sized buckets, if that makes sense…

This is unfortunately not a problem I personally need solving at the moment and I don’t have time for more testing as I have other proprieties…

As I mentioned before, I have no direct experience of doing exactly what you are so everything I have written have been theoretical suggestions of what could work.

Good luck :slight_smile: Hopefully you get it to work the way you expect it to.

Thanks @b0b, Did u configured those test data directly in metricbeat yml or via API? I tried similar configuration in metricbeat.yml, but config file is not loading while starting metricbeat.