Understanding my first grafana project: monitoring release environments

  • What Grafana version and what operating system are you using?
    Grafana 7.5.9

  • What are you trying to achieve?
    The goal is to monitor release/deployment pipelines of TFS which can have multiple steps called environments. We have hundreds of environments deployed every night and deploy automatic tests among other things. TFS is not the best for monitoring or filtering results. That’s where Grafana comes in: I would like to get a quick list of all recently failed releases and in addition a failure rate, preferable a trend over some days. In other words: are we seeing more failures in automatic test runs or less compared to e.g. last week?

  • How are you trying to achieve it?
    I thought it would be a nice metric to query TFS about our release/deployment/automatic test runs. I wrote code that gets the results of TFS from the e.g. last 14 days. The failed and the successful runs are pushed into influxDB using [1]. That works quite well and I am able to generate the failure rate and the list of failed runs. Since I am pushing the data into influxDB, I am able to use the release creation date of TFS as the influxDB timestamp. The data in Grafana is exactly timed like in TFS. I thought I should be able to achieve this with a single measurement.

  • What happened?
    In TFS release pipelines you can re-deploy failed runs with exactly the same settings to “try again” your deployment. Which makes sense for some sort of errors (e.g. network timeout). This means, that for some release pipeline environments I receive at first the result “failed” (together with timestamp, release name, and other things as influx DB tags) and then later when someone successfully re-deploys an environment, the same data with field state “passed”.

  • What did you expect to happen?
    I was somehow expecting that the “passed” would overwrite the “failed” value, since everything else is the same. I am unsure if that’s good or bad for my use case. Because I would like to track the failure rate, I definitely want to track how many “failed” environments we have had in the past. However, because I also want a list of recently “failed” environments, my list fills with data that has been re-deployed successfully. Do you think that I need two measurements? Is there any conceptual issue in my thinking?
    In other words: In one case I want the “failed” runs to be overwritten by “successful” runs and I don’t know how exactly to achieve this. In other case I don’t want them to overwrite each other.

  • Can you copy/paste the configuration(s) that you are having problems with?
    I think my question is more on a conceptual level.

  • Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
    No

  • Did you follow any online instructions? If so, what is the URL?
    Not really. Some reading here and there.

[1] GitHub - influxdata/influxdb-client-csharp: InfluxDB 2.0 C# Client

Perhaps I am not understanding you @tomwaitforitmy but this should be possible using the same query but duplicated across several different panels, each manipulating the data and data-presentation in different ways.

Perhaps it would be useful to move this away from the conceptual realm. Have you configured InfluxDB datasource for Grafana? If you can query your data and view it in the table panel, share a screenshot, or even better the raw query and response from the inspect drawer of the panel options :+1: