I have set up a basic install of Grafana/Promtail/Loki to try to have a convenient way to see the number of time an error is showing up in logs. My issue is that one minute it will give me different results from another. I took these screenshots to help explain.
This first screenshot shows me looking at the results around 2 pm (time at top).
I left my browser open to the same panels until about 6 pm and took another screenshot, just for documentation (time at top):
I then change the timeframe on the panel to 24 hours. This is to make sure I get all of the same results, plus anything else that happened in the past day, and anything that happened since the first time I checked. However, the results are almost completely different:
As you can see, the two 404 errors for login.php are no longer there, nor is the one for simple.php. This doesnāt make sense; those should still be there because based on the first time I visited the page, they mustāve happened just earlier this morning.
Iāve been having these issues consistently using many different setups. Sometimes it will show different results, sometimes it will show no results. In my most recent attempt to set these up, I did a very basic install using Lokiās suggested docker-compose.yaml with a couple of changes for my needs.
docker-compose.yaml
version: "3"
networks:
loki:
services:
loki:
container_name: loki-logs
image: grafana/loki:2.9.0
restart: unless-stopped
ports:
- "3100:3100"
volumes:
- /var/lib/docker/volumes/loki-auto-time/etc/loki/local-config.yaml:/etc/loki/local-config.yaml
command: -config.file=/etc/loki/local-config.yaml
networks:
- loki
promtail:
container_name: promtail-logs
image: grafana/promtail:2.9.0
restart: unless-stopped
volumes:
- /var/log:/var/log
- /var/www/vhosts:/var/www/vhosts
- /var/lib/docker/volumes/promtail-auto-time/etc/promtail/config.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
networks:
- loki
grafana:
container_name: granfana-logs
environment:
- GF_PATHS_PROVISIONING=/etc/grafana/provisioning
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
entrypoint:
- sh
- -euc
- |
mkdir -p /etc/grafana/provisioning/datasources
cat <<EOF > /etc/grafana/provisioning/datasources/ds.yaml
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki:3100
basicAuth: false
isDefault: true
version: 1
editable: false
EOF
/run.sh
image: grafana/grafana:latest
restart: unless-stopped
ports:
- "3000:3000"
networks:
- loki
Loki config
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
# Added by Brian
# https://community.grafana.com/t/too-many-outstanding-requests-on-loki-2-7-1/78249/11
# https://github.com/grafana/loki/issues/4613
limits_config:
split_queries_by_interval: 0
query_scheduler:
max_outstanding_requests_per_tenant: 2048
# By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
# analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
#
# Statistics help us better understand how Loki is used, and they show us performance
# levels for most users. This helps us prioritize features and documentation.
# For more information on what's sent, look at
# https://github.com/grafana/loki/blob/main/pkg/usagestats/stats.go
# Refer to the buildReport method to see what goes into a report.
#
# If you would like to disable reporting, uncomment the following lines:
#analytics:
# reporting_enabled: false
Promtail config
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: homf
pipeline_stages:
- match: # Parse the parts of Yii's log files to get labels
selector: '{job="homf"}'
stages:
- regex:
expression: '(?P<time>\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d) \[(?P<ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3})\]\[(?P<userId>.+?)\]\[.+?\]\[(?P<level>.+?)\]\[(?P<category>.+?)\] (?P<msg>[^\$].+)'
- labels:
time:
ip:
userId:
level:
category:
msg:
# - timestamp:
# source: time
# format: "2006-01-02 15:04:05"
- match: # Drop anything that doesn't have a message
selector: '{job="homf", msg!~"(.+)"}'
action: drop
static_configs:
- targets:
- localhost
labels:
job: homf
__path__: /var/www/vhosts/example.org/httpdocs/*/runtime/logs/*.log
Why would this be happening? Iāve tried to find ways to query Loki for āeverythingā to help me troubleshoot if these are really disappearing or if it might be something else, but I canāt find any good way to troubleshoot this.
I created another topic that has a lot more details of other things Iāve tried. Itās a bit verbose but there may be some useful information there ( Need help with best practices for debugging promtail - loki and grafana too ).