Replace random ids in the logs to static text to aggregate them together

I have logs that have urls that contains ids in them, for eg:

/dataservices/journeys/0acac220-06c8-49d5-9d4c-37274dbe576c/journey-deactivation
/dataservices/journeys/3b9315c3-87ae-4d41-b463-f965a1be18d6/journey-deactivation
/dataservices/journeys/5e9c65de-be45-4715-aae1-1767459a2f78/journey-deactivation
/dataservices/journeys/7defe232-05d9-51d7-9e4a-12275dcm586c/journey-activation
/dataservices/journeys/4b324406-8fdd-43fa-a3e6-e275d1b83037/journey-activation
/dataservices/journeys/12342sdfsd2342dfg3

and so on. I am trying to build a dashboard that does aggregations on these urls. So currently if I do a count by label, each of these gives the count as 1. However I am interested in the type of urls and not the individual url themselves, so want to replace those ids to some plain text for eg:

/dataservices/journeys/cid/journey-deactivation
/dataservices/journeys/cid/journey-activation
/dataservices/journeys/gid

so that, when I do the count by label I get 3, 2, 1 respectively.

I thought the value mapping feature would help, but it doesn’t do anything to the aggregations. Any help is much appreciated.

I does help with

But mapped values indeed won’t help with aggregation, because they are just used for the visualization.

Preferred approach would be to do everything you need at datasource level

You can use regex to lazy match the last /, but your last line of log is problematic because it’s missing the last component. The best solution I can think of is to use regex first, then use label_format but with a default value if the label is not set (to handle cases such as the last line). Note that you have to force the number of component otherwise your last line of log will match the ID instead.

Example:

sum by (type_cleansed) (
  count_over_time(
    {<SELECTOR>}
    | regexp `^\/.+\/.+\/.+\/(?P<type>[^\/]*)`
    | label_format type_cleansed=`{{ .type | default "unknown"  }}`
    [$__interval]
  )
)