Apologies if this is wrong category and potentially a duplicate. I did search extensively and found very similar topics on GitHub and GrafanaLabs. While they were useful in troubleshooting and understanding how the promtail configuration file works, they did not help resolve the problem.
I also checked Loki and Promtail documentation, including information on timestamp stage at timestamp | Grafana Loki documentation.
Setup:
- Computer A: Windows 10 Enterprise 2016 LTSB with Prometheus 2.48.1, Grafana 10.2.2.0 and Loki 2.9.3
- Computer B: Windows 10 Enterprise IoT LTSC with Promtail 2.9.3
- Both computers are on the same local domain
- Computer clocks are not synchronised, Computer A is 30 sec ahead of Computer B
- The reason they are not synced is because Computer B is part of a “test lab” where all computers are synchronised to a special real-time device. Computer A is not part of that “test lab” and needs to stay separate from this and other test labs with similar devices.
- There is no connection to Internet and there is no NTP server in this test environment
- There are logs on Computer B that I would like to monitor.
- Example log entries (2 sec worth of data):
(timestamp, telegram header, value, code)
2023-12-20 13:28:42.004Z ← $HEROT,0.016,A2C
2023-12-20 13:28:42.004Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.082Z ← $HEROT,0.016,A2C
2023-12-20 13:28:42.108Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.185Z ← $HEROT,0.016,A2C
2023-12-20 13:28:42.211Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.315Z ← $HEROT,0.016,A2C
2023-12-20 13:28:42.315Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.471Z ← $HEROT,0.017,A2D
2023-12-20 13:28:42.471Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.497Z ← $HEROT,0.017,A2D
2023-12-20 13:28:42.523Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.601Z ← $HEROT,0.017,A2D
2023-12-20 13:28:42.601Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.679Z ← $HEROT,0.017,A2D
2023-12-20 13:28:42.705Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.783Z ← $HEROT,0.017,A2D
2023-12-20 13:28:42.809Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.939Z ← $HEROT,0.017,A2D
2023-12-20 13:28:42.965Z ← $HETHS,92.49,A2B
2023-12-20 13:28:42.991Z ← $HEROT,0.017,A2D
2023-12-20 13:28:42.991Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.094Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.094Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.198Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.198Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.301Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.301Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.457Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.457Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.483Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.509Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.586Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.611Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.689Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.715Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.791Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.817Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.945Z ← $HEROT,0.017,A2D
2023-12-20 13:28:43.971Z ← $HETHS,92.49,A2B
2023-12-20 13:28:43.997Z ← $HEROT,0.017,A*2D
- Logged on users on Computer A and B are members of Administrators group.
Promtail configuration on Computer B:
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: C:\\Promtail\\promtail-positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: HDG_logs
#use_incoming_timestamp: true
__path__: 'C:\\Logging\\Sensors\\HDG-1*.log'
pipeline_stages:
- regex:
expression: '^(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}Z) <- \$(?P<telegram>[A-Z]+),(?P<value>-?\d+\.\d+),[A-Z]+\*(?P<status>[0-9A-F]{2})$'
- timestamp:
source: time
format: '2006-01-02 15:04:050.000Z'
- labels:
time:
telegram:
value:
status:
Loki configuration on Computer A:
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
common:
instance_addr: 127.0.0.1
path_prefix: C:\\Temp\\Loki
storage:
filesystem:
chunks_directory: C:\\Temp\\Loki\\chunks
rules_directory: C:\\Temp\\Loki\\rules
replication_factor: 1
ring:
kvstore:
store: inmemory
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
schema_config:
configs:
- from: 2023-12-01
store: tsdb
object_store: filesystem
schema: v12
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
Error messages when starting promtail on Computer B:
level=error ts=2023-12-20T14:12:33.7294912Z caller=client.go:430 component=client host=172.20.101.200:3100 msg="final error sending batch" status=400 tenant= error="server returned HTTP status 400 Bad Request (400): 14 errors like: entry for stream '{filename=\"C:\\\\Logging\\\\Sensors\\\\HDG-1_2023.12.20_14.05.23.log\", job=\"HDG_logs\"}' has timestamp too new: 2023-12-20T14:12:33Z; 10 errors like: entry for stream '{filename=\"C:\\\\Logging\\\\Sensors\\\\HDG-1_2023.12.20_14.05.23.log\", job=\"HDG_logs\"}' has timestamp too new: 2023-12-20T14:12:32Z"
Using --log.level=debug does not reveal any information that would point to the root cause of the problem
Output in Loki terminal on Computer A:
level=info ts=2023-12-20T13:14:08.9449325Z caller=flush.go:167 msg="flushing stream" user=fake fp=439a7ac464b4f701 immediate=false num_chunks=1 labels="{filename=\"C:\\\\Logging\\\\Sensors\\\\HDG-1_2023.12.20_14.05.23.log\", job=\"HDG_logs\"}"
level=info ts=2023-12-20T13:19:25.4040061Z caller=table_manager.go:136 index-store=tsdb-2023-12-01 msg="uploading tables"
level=info ts=2023-12-20T13:20:25.403661Z caller=table_manager.go:136 index-store=tsdb-2023-12-01 msg="uploading tables"
level=info ts=2023-12-20T13:21:25.4043164Z caller=table_manager.go:136 index-store=tsdb-2023-12-01 msg="uploading tables"
level=info ts=2023-12-20T13:21:39.4803232Z caller=flush.go:167 msg="flushing stream" user=fake fp=439a7ac464b4f701 immediate=false num_chunks=1 labels="{filename=\"C:\\\\Logging\\\\Sensors\\\\HDG-1_2023.12.20_14.05.23.log\", job=\"HDG_logs\"}"
level=info ts=2023-12-20T13:22:25.4039709Z caller=table_manager.go:136 index-store=tsdb-2023-12-01 msg="uploading tables"
level=info ts=2023-12-20T13:22:25.4089746Z caller=table_manager.go:228 index-store=tsdb-2023-12-01 msg="syncing tables"
ts=2023-12-20T13:22:25.4089746Z caller=spanlogger.go:86 level=info msg="building table names cache"
How it looks in Grafana:
My suspicions revolve around two possible issues:
a) something to do with the timestamp stage and format in promtail configuration (and I spent a lot of time experimenting with different settings). For example, it’s a bit strange that the promtail error quotes time without decimal seconds, i.e. “timestamp too new: 2023-12-20T14:12:32Z”
b) there are too many log entries per second, which causes a problem during parsing
Will appreciate any suggestions on how to investigate this further.