Understanding Beyla cardinality

srebrian · November 27, 2024, 4:56pm

What Grafana version and what operating system are you using?
Currently running Beyla - 1.8.8
Prometheus - 2.55.1
What are you trying to achieve?
We enabled Beyla in our development environment as a proof of concept. The goal is to have Beyla and eventually Tempo running in this environment to showcase the observability and build this out for all our clusters and aws accounts.
How are you trying to achieve it?
By installing using the available helm charts
What happened?
Our Prometheus pod installed via the prometheus community helm chart typically sits around 3.6 GB in mem utilization in this environment. When beyla gets enabled the cardinality on some metrics spikes dramatically over a short period (minutes) and memory consumption rises until the pod gets OOM killed. looking at the “tsdb-status” page we can see the cardinality top 10 has 1 label (server_port) with 30k+ where all the rest are 5k or below. Looking at the job itself I do not see that label on the job which makes it seem like its directly on the metrics themselves. I tried dropping it with a metric relabel rule but that didnt seem to work
What did you expect to happen?
we expected SOME increase in cardinality and memory consumption but not more than 24GB

Can you copy/paste the configuration(s) that you are having problems with?
Not sure what is needed here. I tried adding the following settings to Prometheus which did not help the situation at all:

max-block-duration: 15m

# min-block-duration: 15m
# head-chunks-write-queue-size: 500
# max-series-per-shard: 2500
# max-series: 10000
# max-bytes-to-drop: 5368709120
# wal-compression: true
# head-chunks-limit: 50000
# out-of-order-time-window: 5m
# max-exemplars: 50
# no-lockfile: true

Did you receive any errors in the Grafana UI or in related logs? If so, please tell us exactly what they were.
No errors just the pods getting OOM killed
Did you follow any online instructions? If so, what is the URL?
Co-worker went through some ai and the beyla documentation before doing the initial setup.

so the ask is to help drop the server_port properly and or reduce the overall memory footprint so beyla does not cause prometheus to become unstable

jangaraj · November 27, 2024, 5:01pm

srebrian · November 27, 2024, 5:07pm

Thanks for the response, but that document points to only a few attributes enabled by default. Specifically:
" By default, only the following attributes are reported: k8s.src.owner.name , k8s.src.namespace , k8s.dst.owner.name , k8s.dst.namespace , and k8s.cluster.name ."

none of which is the server_port which is the label with high cardinality. This is again applied at the metric level and not the pod level coming from the kubernetes-pods job in prometheus.

so, If I am reading the documentation provided correctly the label should already be excluded as an attribute from the metrics provided by beyla but isnt? What steps can be taken here to verify and correct this?

jangaraj · November 27, 2024, 5:16pm

github.com

grafana/beyla/blob/9efc255837ebda42818b57ac74bf51ef091f0e49/docs/sources/metrics.md#attributes-of-beyla-metrics

---
title: Beyla exported metrics
menuTitle: Exported metrics
description: Learn about the HTTP/gRPC metrics Beyla can export.
weight: 21
keywords:
  - Beyla
  - eBPF
  - metrics
aliases:
  - /docs/grafana-cloud/monitor-applications/beyla/metrics/
---

# Beyla exported metrics

The following table describes the exported metrics in both OpenTelemetry and Prometheus format.

| Family              | Name (OTEL)                     | Name (Prometheus)                      | Type          | Unit    | Description                                                                                                                          |
|---------------------|---------------------------------|----------------------------------------|---------------|---------|--------------------------------------------------------------------------------------------------------------------------------------|
| Application         | `http.client.request.duration`  | `http_client_request_duration_seconds` | Histogram     | seconds | Duration of HTTP service calls from the client side                                                                                  |

This file has been truncated. show original

srebrian · November 27, 2024, 7:54pm

I read through that document and can see that the network.flow.bytes metric is the one that has the attr for server_port. I have tried adding drop rules from the scrape job and from the export side

prometheus_export:
port: 9090
path: /metrics
metric_relabel_configs:
- action: labelkeep
regex: ^(?!server_port$).*$

and in prometheus:

honor_labels: true
job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: drop
regex: ‘network_flow_bytes’
source_labels: [‘name’]
- action: labeldrop
regex: ‘server_port’
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape

Neither of which prevented this:

also apologies for the parsing making the configs look weird just assume they are properly indented and has the correct number of _ characters

srebrian · November 27, 2024, 7:55pm

If I am missing something obvious feel free to point it out. Just trying to get this to be stable and not consume 30GB of ram in 1 pod.

jangaraj · November 27, 2024, 8:53pm

Why are you dropping on the Prometheus side when you can specify which attributes will be generated on the Beyla side? From already linked Beyla doc:

srebrian · December 2, 2024, 2:35pm

sorry for the delayed reply. was a holiday in the US. with that said the reason we were trying to dump it on the prometheus side and I apologize for not being more clear is the above configuration had been attempted in a couple of different ways unsuccessfully (attempting to turn it off in beyla, the exporter and also in prometheus itself)

However in the interest in of trying to leave no stone unturned here we used the above verbatim and re-created the beyla pods / infrastructure.

The result was that the server port STILL shows on the metrics. From the documentation I only saw it associated with the network_flow_bytes but in the metric output I see it on http_client_request_body_size_bytes_bucket

If we take a step back there are really two issues at play.

beyla causes high cardinality and extreme memory usage in its default configuration
this one metric in particular has exceedingly high cardinality

I have made an assumption in that 2. is the cause of 1.
its possible they have very little to do with eachother

so we should do a couple of things in this thread if possible.
Identify the amount of memory that would be appropriate or optimal for running a prometheus pod collecting beyla metrics

identify potential causes and solutions to tuning the memory utilization down to a normalized level so that the infrastructure can be run successfully regardless of what environment I am running it in.

jangaraj · December 2, 2024, 2:59pm

It is experimental feature - it can be changed anytime:

Don’t trust a doc - check source code of your version, when you want to be 100% sure.

1 beyla causes high cardinality and extreme memory usage in its default configuration
2 this one metric in particular has exceedingly high cardinality

But that’s mentioned in the doc:

And you still have an option to tweak in the Beyla config for your need.

Another option is to tweak your TDSB storage, so it doesn’t have a problem with high cardinality:

Topic		Replies	Views
Grafana Beyla with Node.js in k8s - missing metrics Configuration	0	284	November 16, 2023
Use grafana application observability with beyla metrics only Grafana observability , tempo , beyla	0	88	December 3, 2024
Can't make trace distrbute working with beyla.ebpf Grafana Alloy	0	238	May 20, 2025
What are the basics of eBPF and Beyla? Ask AI	7	416	February 6, 2025
Beyla with haproxy Grafana beyla	1	21	July 1, 2026

Understanding Beyla cardinality

max-block-duration: 15m

Related topics