System restarts reset counters to zero / lots of time series

  • What Grafana version and what operating system are you using?
    9.4.7 Cloud

  • What are you trying to achieve?
    Produce a dashboard (and report) showing how many bytes (of audio) are processed by a particular system. I want to specify the time period and ignore things like system restarts that reset the count to zero. We want to bill the customer based on the bytes we processed on their behalf on a monthly basis.

  • How are you trying to achieve it?
    I applied a Time Series graph to a metric (an opentelemetry counter) called xxx_bytes.

  • What happened?
    The metric contains attributes which include highly specific values, like span id, trace id and session id. This results in grafana creating one time series for each unique user session, span id, trace id, etc… By default, it looks like this:

Note that those time series could represent sessions that ended days ago. When a session ends, this is the code that the system executes (.NET OpenTelemetry API):

ActiveOperationsUpDownCounter.Add(-1, ctx?.MetricTags);
TotalKilobytesCounter.Add(bytes / 1024, ctx?.MetricTags);

“TotalKilobytesCounter” is the source of the xxx_bytes metric. “MetricTags” includes all the attributes (including the session id, trace id and span id, and how we end up with such “unique” time series, as far as I can tell). The only way for that session/metric to expire is if the system restarts.

Grafana rightly said “Many time series results returned. Consider aggregating with sum().” I did that and it is now one time series. Which is good.

However, restarting the system resets everything back to zero. When users start sending data again, the value will start increasing again. Here is a graph showing a system restart (with sum() applied):

I have two questions:

  1. Is there anything else I should be doing (beyond just sum()) to this data to clean it up and remove the “uniqueness”?
  2. How can I ignore the system restarts so the values don’t reset to 0

I’m a newbie, so sorry if this is basic.

Thanks,

Brian

1 Like

Similar issue. We have a counter that resets after a maintenance patch of the cluster. we cannot track/aggregate how many counts by type occurred in a given span if the counter resets within the reporting period.

Hi @kenburk1 ,

Thanks for the feedback. Do you also experience this issue while using Grafana Cloud?