Optimizing Prometheus Configuration for Enhanced Performance

danilkastranicadosta · April 15, 2024, 8:38am

Hello,

I am currently looking to optimize my Prometheus configuration to improve its performance, particularly concerning scrape settings and job configurations. Below is my existing configuration. We have already maximized timeout settings, and I suspect there might be issues with how different groups are configured. Here’s the relevant part of my config:

global:
scrape_interval: 30s
scrape_timeout: 10s
scrape_protocols:

OpenMetricsText1.0.0

OpenMetricsText0.0.1

PrometheusText0.0.4
evaluation_interval: 15s

scrape_configs:

job_name: prometheus
honor_timestamps: true
track_timestamps_staleness: false
scrape_interval: 1m
scrape_timeout: 20s
enable_compression: true
enable_http2: true
static_configs:

targets: [‘xxxx:8180’, ‘xxxx:8180…’]
labels:
group: ‘various_groups’

I have several questions:

Compression and Transport Methods: I currently have enable_compression: true and enable_http2: true settings enabled for my Prometheus and Pushgateway jobs. Should I use these settings universally across all jobs to reduce network load and speed up data transport?
Job Segmentation: How can I effectively segment scrape_configs into smaller, more manageable parts? My configuration has multiple targets grouped together, which could potentially be split by type or priority for better performance.
Remote Write Optimization: Given the current settings in my remote_write configuration, how can I further optimize the parameters like capacity and max_shards to enhance data transmission efficiency without overwhelming the resources?
Metric Filtering: Could you suggest some metrics that are commonly removed in setups similar to mine to decrease the load on the system? I am using relabel_configs to filter out unnecessary metrics but would appreciate specific examples of metrics that can be excluded.

I appreciate any insights or recommendations you could offer based on your experience. Thank you!