I have a recording rule set up in Loki, sending the data to Mimir, yet it never arrives and there are no errors either in Loki or in Mimir. I have tried creating an Alert rule and that does work.
For Alerts, I do see the alert triggering in Grafana Alerts, however I’m also seeing the following in the logs:
Loki Backend:
caller=dedupe.go:112 storage=registry manager=tenant-wal instance=fake component=remote level=error remote_name=fake-rw-mimir url=http://mimir-nginx.mimir.svc:80/api/v1/push msg="non-recoverable error" count=3 exemplarCount=0 err="server returned HTTP status 400 Bad Request: failed pushing to ingester: user=fake: the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2023-05-24T19:24:37.445Z and is from series {__name__=\"ALERTS_FOR_STATE\", alertname=\"Test Loki Alert Rule\", site=\"sitename\"}"
Mimir
caller=grpc_logging.go:43 level=warn duration=1.021925ms method=/cortex.Ingester/Push err="rpc error: code = Code(400) desc = user=fake: the sample has been rejected because another sample with the same timestamp, but a different value, has already been ingested (err-mimir-sample-duplicate-timestamp). The affected sample has timestamp 2023-05-24T19:10:37.445Z and is from series {__name__=\"ALERTS_FOR_STATE\", alertname=\"Test Loki Alert Rule\", site=\"sitename\"}" msg=gRPC
It would appear that this alert is being generated by multiple Loki instances, all of which have the same labels. To work around this, you can add a label which uniquely identifies the pod from which the alert was generated.
It appears that you’re using the simple-scalable Helm chart, is that correct?
This might be a bug in our chart if you have loki-backend scaled > 1.
Alert rule is still firing in Loki but not getting sent to Mimir Alert Manager so it can be forwarded to On Call.
Recording rule is still running on the single Loki backend without error. In the Mimir distributor, the push error for the sample rejection went from rejecting it constantly to only occasionally, I’m guessing as the timestamps may not being aligning in the Loki backend pods anymore.
However the recording rule is evaluated every 60 seconds, I’m not seeing an constant error, yet it’s not showing up in Mimir anywhere.
@dannykopping - any thoughts on what else I can try or to possibly turn up logging? I figured I’d see some error somewhere either on the Loki side or the Mimir side if the metrics and alerts weren’t properly being sent or recorded. It’s very odd.
I’m wondering if it has something to do with tenancy. I don’t use tenancy in Mimir or Loki. Mimir Nginx uses:
# Ensure that X-Scope-OrgID is always present, default to the no_auth_tenant for backwards compatibility when multi-tenancy was turned off.
map $http_x_scope_orgid $ensured_x_scope_orgid {
default $http_x_scope_orgid;
"" "anonymous";
}
Wondering if they’re just discarding stuff silently?
This isn’t a great situation. I disabled multi-tenancy in Mimir and now my Loki rules are working. There has to be an easier, more logical way of handling this. Fortunately I don’t need multi-tenancy in Mimir at this time, but it is possibly in our future. If I need multi-tenancy in Mimir but not Loki, what happens?
I haven’t had to use remote_write from ruler yet, but since a lot of Loki ruler’s implementation is very similar to prometheus, I wonder if the remote_write configuration also supports headers and basic_auth.
This has come up a few times. Loki chose fake previously (which I hate, btw, and causes so much confusion), and Mimir chose anonymous. We might change this in Loki v3, but it’s complicated in terms of backwards-compatibility.
It may be easier to update the Mimir gateway to rewrite the header for fake to anonymous? Still doesn’t change the fake to something better. Personally - I would have gone for “default” as the default tenant