Purge data from the default Prometheus instance

Hi,

Playing with Prometheus in the Grafana cloud. So far so, so good. But Prometheus contains a lot of bogus data. Is there a way to clean (purge) up this data? So I can start with a fresh start?

  • SO

Hi @sharedps . There’s no way to purge data that you’ve already ingested into the Prometheus endpoint at Grafana Cloud however you can refine and control any future “bogus data” using these helpful guides to analyze and reduce the Prometheus metrics being sent over via remote_write method: Prometheus | Grafana Labs

Wow! That’s a big con for the cloud version in that case. But thanks for answering anyways. I think it’s better/quicker for me in this case to remove this stack and create a new one.

1 Like

Definitely a major obstacle to playing with Grafana Cloud before migrating production data. No way we’ll commit moving tens of millions of time series instances with terrabytes of data over a year to cloud without being able to do basic stuff like this first. It’s a great example of why to keep data ownership local and not outsource it. I can’t believe more people haven’t complained about this in the last 2 years? It came up in the first day of me playing with a cloud instance. I tried populating a small subset of one day’s worth of data in one shot… hit the rate limiting problem and took a few minutes to figure out how to regulate that on my side. Meanwhile I populated one or two datapoints in the last hour, and it won’t let me backfill the prior 24 hours of data. So I have to create an entirely new instance of Grafana Cloud after each time I encounter some new limitation while I am exploring whether we want to consider using it for millions of metric instances and terrabytes of data over a year? What happens if we have an operational issue 3-6 months into being production, and we need to backfill some data? I don’t want to wipe the stack and start over backfilling data from day 1. This is not ready for prime time unless there is something I’m missing about how I should use the service.

1 Like

Hey Scott, thanks for posting! I’m a member of the Customer Success team here at Grafana. Wanted to recap a few of the points you made to make sure I understand the problem.

  1. No way to clean up unused time series
  2. No way to backfill historical data
  3. Rate limiting

Sorry you’re running into these issues! Fortunately we can help here.

  1. This year we introduced Adaptive Metrics - which helps you manage your metrics cardinality to minimize your costs
  2. You are able to backfill your historical time series data from Prometheus by working with our team
  3. These rate limits are in place on self-serve free trials. Sounds like you’re working on a large-scale proof-of-concept. Our team can help remove these roadblocks for you so that you can test with ease.

Hi Preston, I appreciate the timely reply.

We are early in our eval, not trying to do a full load test yet. Focus is collaborating with shared view of dashboards and sample data to clarify feature requirements to meet our use cases. In process of populating data for that, I ran into these issues that I foresee would be an obstacle to us adopting Grafana Cloud (apart from the dashboard feature requests). Better to flag it now and let the various issues be worked in parallel.

I’m not worried about rate limiting- I take that in stride (especially given it’s a trial and an eventual large deployment would factor in scaling and ingestion rate issues).

No way to delete timeseries or backfill data is a bigger red flag. This has already come up in the last year of me working with an on-prem deployment. If there is a way to delete/backfill with manual intervention via support or account team, it stops being an absolute show-stopper… But it would be a priority feature ask to make this accessible with some safety guard rails, e.g. to fix stuff in middle of the night when it might be hard to reach support staff.

Thanks.

One more subtle issue that I haven’t fully analyzed where the 10 minute lookback window seems to be erroneously (or at least suboptimally) triggered…

I’m using VictoriaMetrics vmagent as a remoteWrite relay to get data into Grafana Cloud prometheus datastore. The start of my data pipeline is a custom set of scripts that push data via graphite ingestion protocol to vmagent, and then use vmagent’s remoteWrite capability to translate to prometheus format for ingestion at Grafana Cloud.

In our current small sample case, we have ~36,000 metrics with unique labels, and even trying to backfill data over 20-30 minutes we’re getting blocks rejected… I suspect something about the data ordering is causing us to hit the 10 minute out-of-order window. Here’s an example log we get back:

2023-11-06T23:01:24.018Z error VictoriaMetrics/app/vmagent/remotewrite/client.go:400 sending a block with size 135460 bytes to “1:secret-url” was rejected (skipping the block): status code 400; response body: failed pushing to ingester: user=1255853: the sample has been rejected because another sample with a more recent timestamp has already been ingested and this sample is beyond the out-of-order time window of 10m (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2023-11-06T22:36:00Z and is from series {name=“net_if_byte_count”, descr=“DA88 Vnwiy Aiiyvv Pngf”, host=“lipvvti352.jfxm.mh.ngr”, inSpeed=“1000000000”, inout=“out”, intf=“Gi1/0/5”, outSpeed=“1000000000”, speed=“1000000000”} (sampled 1/10)

Given the large block size, I think that Grafana side might be receiving a given timeseries from old to new timestamp, then a different timeseries going back from old to new timestamp potentially in the same block of data, and then the whole block is being rejected.

If this is indeed an issue, it would be less restrictive if the 10 minute look-back window and comparison to last timestamp happened on each unique timeseries instance (i.e. unique set of labels for a metric) rather than on a per-block or per-metric name or other aggregated basis.

In any case, it seems we can only really populate data as a steady state real-time process, and not as a bulk fill for some lookback window unless we do a lot of customization on our data transmission side.

Thanks,

Hi team,

Just wanted to let you know a member of our growth team is going to be reaching out separately to you to discuss this further! Stay tuned for more :slightly_smiling_face:

-Shelby at Cloud Success

Hi.

I just connected a simple minikube cluster to my Grafana Cloud stack using the default Prometheus Kube Stack and I reached the 10k Prometheus metrics limit in less than half an hour. I read then the configuration page to reduce the number of metrics to be sent, but I can’t test it since I already hit the limit.

I really need to delete the stored data to see if the reduction helped, but I can’t because there is no way to do that, unless deleting the current stack and creating a new one, reconfiguring everything again. I am willing to understand the reasoning behind this choice, but I would consider this issue as a blocker for fully adopting Grafana Cloud.

Hi Alberto –

Because Grafana Cloud does not bill for metrics based on aggregated ingestion, rather active series (learn more here), any reductions you make to bring your series down today would be reflected pretty immediately in your account, and more specifically in your Billing/Usage dashboard. Navigate to the “Metrics Active Series Details” row in the Billing/Usage dashboard and open it up to see your metrics ingestion patterns this month and track that the reductions that you’ve made are reflected. If active series drop below 10K then you should see that there. Either way, hitting this limit would not keep you from continuing to send and query metric data at Grafana Cloud for the remainder of the month. In fact you’re still ingesting metrics now, we’re just limiting you to 10K active series worth.

(As an inverse example, for services like logs and traces, this is an ingestion based billing model so if you’ve hit your GB ingestion limit for those service you would indeed not be able to ingest anymore until the following calendar month – and you could not purge or reset that data ingestion.)

Hope that helps clarify.