Loki+promtail error 429

Good afternoon. I have a container on proxmox with a loki server, it is necessary to collect nginx logs through the promtoil agent. Everything works well from one machine - it is taken, parsed (300-500mb per day on this server). The problem occurs when connecting the second server (9-10GB per day) - errors 429 occur - maximum active stream limit exceeded? reduce the number of active streams… For a long time I have already tried big limits and small ones, I can’t win in any way.
Loki put from github.
How do I set limits and configure streams, because nothing works with documentation.

My loki config is local-config.yaml:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9095
http_server_read_timeout: 310s
http_server_write_timeout: 310s
grpc_server_max_recv_msg_size: 9663676416
grpc_server_max_send_msg_size: 9663676416
grpc_server_max_concurrent_streams: 0

common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
instance_addr:
kvstore:
store: inmemory

ingester:
lifecycler:
address: localhost
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 15m
chunk_target_size: 1572864
chunk_retain_period: 0s
chunk_encoding: snappy
max_chunk_age: 12h
max_transfer_retries: 0

query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
ttl: 2h

schema_config:
configs:
- from: 2020-11-01
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_loki
period: 24h

storage_config:
boltdb_shipper:
active_index_directory: /tmp/loki/boltdb-shipper-active
cache_location: /tmp/loki/boltdb-shipper-cache
cache_ttl: 24h

compactor:
working_directory: /tmp/loki/boltdb-shipper-compactor
shared_store: filesystem
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 120h
retention_delete_worker_count: 150
delete_request_cancel_period: 120

limits_config:
max_entries_limit_per_query: 5000
max_streams_per_user: 100000
max_chunks_per_query: 200000
reject_old_samples: true
reject_old_samples_max_age: 24h
ingestion_rate_mb: 1000
ingestion_burst_size_mb: 1500
retention_period: 120h
max_query_lookback: 5d
max_query_series: 100000
per_stream_rate_limit: “512MB”
per_stream_rate_limit_burst: “1024MB”
max_global_streams_per_user: 524288000

frontend:
max_outstanding_per_tenant: 10000
compress_responses: true

ingester_client:
grpc_client_config:
max_send_msg_size: 9663676416

query_scheduler:
max_outstanding_requests_per_tenant: 10000
grpc_client_config:
max_send_msg_size: 9663676416

table_manager:
retention_deletes_enabled: true
retention_period: 120h

ruler:
storage:
type: local
local:
directory: /tmp/loki/rules
rule_path: /tmp/loki/rules-temp
alertmanager_url: http://loki.test.com:9093
enable_alertmanager_v2: true

My promtail.yaml config:
server:
http_listen_port: 9080
grpc_listen_port: 0

positions:
filename: /tmp/positions.yaml
sync_period: 10s
ignore_invalid_yaml: false

clients:

scrape_configs:

  • job_name: nginx
    static_configs:

    • targets:
      • <server_with_logs>
        labels:
        job: nginx_log
        path: /var/log/nginx/*log

    pipeline_stages:

    • match:
      selector: ‘{job=“nginx_log”}’
      stages:
      • regex:
        expression: ‘^(?P<time_local>[^ ]+.)\s+[(?P<error_type>.+)].((?P.+).\sclient:)\s(?P<client_ip>.+).\sserver:\s(?P.+).\srequest:\s"(?P<request_method>[\w])\s(?P<request_uri>[^ ])\s(?P<http_version>[^ ])“.\shost:\s”(?P.+)"$’
      • regex:
        expression: ‘^“(?P<time_local>.)"\sclient=(?P<client_ip>.+)\smethod=(?P<request_method>.)\srequest=”(?P[^ ]\s[^ ])\s(?P<http_version>[^ ])"\srequest_length=(?P<request_length>[^ ])\sstatus=(?P<response_status>[^ ])\sbytes_sent=(?P<bytes_sent>[^ ])\sbody_bytes_sent=(?P<body_bytes_sent>[^ ])\sreferer=(?P<http_referer>[^ ])\suser_agent=“(?P<http_user_agent>.+)”\supstream_addr=(?P<upstream_server>[^ ])\supstream_status=(?P<upstream_status>[^ ])\srequest_time=(?P<request_time>[\d.])\supstream_response_time=(?P<response_time>[\d.])\supstream_connect_time=(?P<upstream_connect_time>[\d.])\supstream_header_time=(?P<upstream_header_time>[\d.])$’
      • labels:
        time_local:
        client_ip:
        request:
        response_status:
        upstream_status:
        request_time:
        response_time:
        error_type:
        errormessage:

My logrotate config:
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 640 nginx adm
sharedscripts
postrotate
if [ -f /var/run/nginx.pid ]; then
kill -USR1 cat /var/run/nginx.pid
fi
endscript
}

Try setting this:

limits_config:
  ingestion_rate_strategy: local

For more explanation on what this is please see Grafana Loki configuration parameters | Grafana Loki documentation

Specified this limit. However, almost immediately after the start of promtail, I see the following messages: msg=“error sending a packet, I will try again” status=429 error=“the server returned the HTTP status 429 Too many requests (429): The maximum limit of the active stream has been exceeded, reduce the number of active streams (reduce labels or reduce label values) or contact the administrator Loki to find out if it is possible to increase the limit.”

As stated by the error log, you have too many streams. From your configuration, I would say that you have too many labels. Out of the ones from your configuration above:

labels:
time_local:
client_ip:
request:
response_status:
upstream_status:
request_time:
response_time:
error_type:
errormessage:

I’d say probably none of them should be labels (see Best practices | Grafana Loki documentation for more explanation). So I’d try removing all labels, and perhaps also take a look at limits_config and increase the number of stream somewhat.

1 Like

If I remove the label, then how do I build a query into boats for filtering? It turns out that a message string will come without a label, and for example, I need to sort by status=500. I can’t do it without label.

See LogQL: Log query language | Grafana Loki documentation on how to parse logs in Loki.

@tonyswumac How does that work alongside using the built in geoip stage for promtail? That adds labels for geoip data to the logs and seems to create a separate stream for each of these labels. Is there a different solution for decorating geoip information within Grafana.

I’ve not had to use the geoip stage, so I may incorrect in some of my suggestions below.

  1. Just let it be. The recommendation isn’t for labels to be “small”, it’s for the labels to be “bounded”, meaning the label values should always have a limit. If you scroll up a bit to my previous comment, things such as request_time and response_time were pretty obviously not suited to be labels. So if you use the country code as labels I don’t think that’s a bad practice.

  2. There is a feature called structured metadata that might be helpful: What is structured metadata | Grafana Loki documentation. Again I’ve not used this, so I don’t know for sure if it’s useful or not.

To do this, add following command to promtail helm chart

config:
  snippets:
    pipelineStages:
      - labels:
          time_local:
          client_ip:
          request:
          response_status:
          upstream_status:
          request_time:
          response_time:
          error_type:
          errormessage: