How do you track state machine states, metrics or logs?

embeddedsoftware · October 20, 2025, 10:37am

I’m setting up Grafana as a monitoring tool for a fleet of robotic systems.

Each robot runs Alloy as a telemetry forwarder, sending data to a Grafana Cloud instance. Metrics are sent via OTLP, and Alloy is also configured to tail logs.

The robotic system includes a state machine that I want to monitor over time. Right now, I’m parsing these states from logs using LogQL, but this approach frequently exceeds the fair use query limits, which makes me think I’m doing it the wrong way.

Would it make more sense to expose the system state as a metric, perhaps casting the states as an enum?

My concern with that approach is versioning, if I add or modify states later, I’d have to update the metric schema and adjust visualizations to handle multiple “versions” of the state machine. Also I have defined it as an enum in my code already, which makes it feel weird that I have to define it in two places. It all seems a little messy, making me wonder if this is a good route as well.

Has anyone found a clean way to handle evolving state machines in Grafana while keeping dashboards maintainable and query costs reasonable? Something that is like a “string” metric?

tonyswumac · October 20, 2025, 6:09pm

Can you share what your logs look like and what query you are using for alerts, please?

Normally metrics query would incur less data usage than log query. You can definitely consider changing the state monitoring from logs to metrics, and you don’t necessarily need to pre-define an enum for the state type. For example, let’s say you have state “fail” and “success”, you can produce metrics like so:

# if success
machine_state{state="success",machine_id="123"} 1

# if fail
machine_state{state="fail",machine_id="123"} 1

Then you can match the state at query time, to check for either success or fail. This also makes changing the state easier, because you don’t need to touch the machines, you just need to adjust your alerts.

embeddedsoftware · October 21, 2025, 8:18am

Hi @tonyswumac thanks for answering!

I am not using an alert, i am using a logQL query directly to generate the input for the visualization, with a limit line of one, since I thought that may decrease the incurred query size.

{hostname="system-hostname"} |= `publishing system state` | pattern `<timestamp> [<node>] <misc>: <mis1>: <system_state>e[0me[0m`

which I use to parse a log line that looks like this:

1761027275.4510579 [system_monitor_node-52] [INFO] [1761027275.442901531] [system_monitoring_node]: publishing system state: IDLE

According to Grafana Explore “This query will process approximately 8.4 MiB”, but this is also probably because the logs are rather verbose.

With the solution that you suggest I would be making an active series per state, which I guess is fine, but is it than still possible to create a state timeline?

Topic		Replies	Views
Alert unable report state changes Time Series Panel	4	1508	August 22, 2017
Stream audit logs to grafana to plot graph and visualization Grafana	1	1787	September 19, 2019
Recommendation for displaying state machines Time Series Panel	6	3907	January 28, 2020
Grafana is not sending alert while state changes, It only sends when we test mail Grafana	0	317	September 5, 2018
Support on Stackdriver log based Metrics Grafana	1	869	January 19, 2022

How do you track state machine states, metrics or logs?

Related topics