Can not query historical data

Hi,

I am facing an issue that won’t allow me to query historical data I imported successfully using promtail. I deployded the loki-distributed helm chart in the most recent version as of today.
I have a sample log with 1mio loglines in proper json which I sorted by timestamp to avoid out-of-order issues. I was able to import them and saw the subdir for the tenant (using multi-tenancy) in my s3 bucket on aws. So far so good.
All my queries either did not return any result or an error. Sometimes even HTTP 502.
I tried using grafana or logcli with the correct tenant header.
Grafana suggests the correct labels that were imported but does not show any data.
When I import them using the timestamp of the import I can query all data, except that it is not the correct timestamp.
The timestamps of the logs vary from 1970-01-01 (obviously a mistake “someone” made :slight_smile: ) to a week ago.

Example queries that did not work are:

logcli query --org-id="my-tenant" '{mylabelkey="mylabelvalue"}' --from="2009-01-01T00:00:00Z" --to="2009-12-23T10:00:00Z"
logcli query --org-id="my-tenant" '{mylabelkey="mylabelvalue"}' --since="20000h"

I already checked with max-query-lookback, which is not set and defaults to 0. I couldn’t find any other config option that seemed obvious to me.

Any ideas?

Thanks for your help.

Not sure what’s happening, but couple of things i’d recommend checking:

  1. Do you perhaps have compactor running with a retention policy going further back than your log timestamp?
  2. Are you sure it’s imported properly? Do you see any error on ingester?
  3. Do you see any error on querier?

I finally made some progress and receive an error message. :slight_smile:

Messages where imported without an error message. Before I sorted it I received a lot, so I think they are now imported properly.

With logcli I can also get the labels.

❯ logcli labels --org-id="my-tenant" --from="1970-01-01T00:00:00Z"
2023/02/28 09:01:40 https://my-loki/loki/api/v1/labels?end=1677571300957266000&start=0
label1
label2

When querying now I receive the following message in querier log.

msg="error fetching chunks" err="failed to get s3 object: NoSuchKey: The specified key does not exist.\n\tstatus code: 404,

When I query a shorter period, like a year, I get this message as a query response as well.

I assume that the querier cannot access the s3 bucket. That is quite surprising since the tenant folder is created on the bucket.

This is my config. Is there any obvious culprit I am missing?

auth_enabled: true
chunk_store_config:
  max_look_back_period: 0s
common:
  compactor_address: http://loki-loki-distributed-compactor:3100
compactor:
  shared_store: aws
distributor:
  ring:
    kvstore:
      store: memberlist
frontend:
  compress_responses: true
  log_queries_longer_than: 5s
  tail_proxy_url: http://loki-loki-distributed-querier:3100
frontend_worker:
  frontend_address: loki-loki-distributed-query-frontend:9095
  parallelism: 10
ingester:
  chunk_block_size: 262144
  chunk_encoding: snappy
  chunk_idle_period: 30m
  chunk_retain_period: 1m
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
  max_transfer_retries: 0
  query_store_max_look_back_period: 0
  wal:
    dir: /var/loki/wal
limits_config:
  enforce_metric_name: true
  ingestion_rate_mb: 75
  max_cache_freshness_per_query: 10m
  max_query_length: 0
  max_query_lookback: 0
  per_stream_rate_limit: 25MB
  query_timeout: 15m
  reject_old_samples: false
  reject_old_samples_max_age: 168h
  split_queries_by_interval: 24h
memberlist:
  join_members:
  - loki-loki-distributed-memberlist
querier:
  max_concurrent: 10
  multi_tenant_queries_enabled: true
query_range:
  align_queries_with_step: true
  cache_results: true
  max_retries: 5
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        ttl: 24h
ruler:
  alertmanager_url: https://alertmanager.xx
  external_url: https://alertmanager.xx
  ring:
    kvstore:
      store: memberlist
  rule_path: /tmp/loki/scratch
  storage:
    local:
      directory: /etc/loki/rules
    type: local
runtime_config:
  file: /var/loki-distributed-runtime/runtime.yaml
schema_config:
  configs:
  - from: "1970-01-01"
    index:
      period: 24h
      prefix: index_
    object_store: aws
    schema: v12
    store: boltdb-shipper
server:
  http_listen_port: 3100
storage_config:
  aws:
    bucketnames: my-bucket
    http_config:
      idle_conn_timeout: 10m
      response_header_timeout: 0
    region: eu-west-1
    s3forcepathstyle: true
  boltdb_shipper:
    active_index_directory: /var/loki/boltdb-shipper-active
    cache_location: /var/loki/boltdb-shipper-cache
    cache_ttl: 24h
    index_gateway_client:
      server_address: dns:///loki-loki-distributed-index-gateway:9095
    resync_interval: 5s
    shared_store: aws
  filesystem:
    directory: /var/loki/chunks
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

I thought this might as well be related to failed to get s3 object: NoSuchKey: The specified key does not exist · Issue #6590 · grafana/loki · GitHub. That solution did not help though.

This is my most recent result.
I can call the labels, but not the actual data.

❯ logcli labels --org-id="my-tenant" --from="1970-01-01T00:00:00Z"

2023/02/28 18:12:53 https://my-loki/loki/api/v1/labels?end=1677604373467159000&start=0
label1
label2
label3

❯ logcli query --org-id="my-tenant" --from="1970-01-01T00:00:00Z" --limit=5 '{label1="my-tenant"}'
2023/02/28 18:13:40 https://my-loki/loki/api/v1/query_range?direction=BACKWARD&end=1677604420542177000&limit=5&query=%7Blabel1%3D%22my-tenant%22%7D&start=0
2023/02/28 18:13:56 Error response from server: failed to get s3 object: NoSuchKey: The specified key does not exist.
status code: 404, request id: FCREYBYMEHFS9P5E, host id: RnmSKnZbdym0gnSHnEZcyGU+yX6y8P+CXJYcq9M07W7bIyiU2fcnOLUFul95rzhulQ/HVoDfLEo=
(<nil>) attempts remaining: 0
2023/02/28 18:13:56 Query failed: Run out of attempts while querying the server; response: failed to get s3 object: NoSuchKey: The specified key does not exist.
status code: 404, request id: FCREYBYMEHFS9P5E, host id: RnmSKnZbdym0gnSHnEZcyGU+yX6y8P+CXJYcq9M07W7bIyiU2fcnOLUFul95rzhulQ/HVoDfLEo=

Have you verified the chunks are actually written to S3?

I see you have this in your configuration:

filesystem:
    directory: /var/loki/chunks

Which would imply your chunks are stored on local filesystem.

Never mind, I see your schema_config. Have you checked your instance profile and IAM permission for reader?

I am using one service account for all pods. I can see that data is created on the s3 storage and the ingester logs that it is uploading them. The compactor has access too with the very same service account. I can tell that because of the newly created files with „compactor“ in its name. Is there a way to define a different false access for the querier? I thought they are all sharing the storage config and the service account.
The policy allows wildcard operations on the bucket and it’s content.
Could it be because of the epoch 0 timestamp in my „from“? I will try to remove those lines a re-import the logs tomorrow.

I am not sure if this is possible, but you can try setting a common storage. We don’t use storage_config, instead we put storage under common (which implicitly gets applied to all components). Like this:

common:
  storage:
    s3:
      bucketnames: {{ loki_storage_config_aws_bucketname }}
      region: {{ aws_region }}
      sse_encryption: true
      s3forcepathstyle: true

Maybe try that. Also, here is the IAM policy that we use for all Loki components, for your reference:

Statement = [
      {
        Action   = [
          "s3:DeleteObject",
          "s3:GetObject",
          "s3:ListBucket",
          "s3:PutObject"
        ]
        Effect   = "Allow"
        Resource = [
          "<S3_Bucket_URN>",
          "<S3_Bucket_URN>/*"
        ]
      }
    ]

I think I found the culprit. I just replaced all 1970-01-01 timestamps with 1971-01-01 and voila… it works.
In my query I still look “from=1970-01-01” but now I don’t receive an error.

I had to disable the gateway pod because I could’t adjust the timeout. That was the next issue I had. Now I use the ingress without the gateway and a timeout of 600s.

This is the config that works this very moment.

auth_enabled: true
chunk_store_config:
  max_look_back_period: 0s
common:
  compactor_address: http://loki-loki-distributed-compactor:3100
  storage:
    s3:
      bucketnames: my-bucket
      http_config:
        idle_conn_timeout: 10m
        response_header_timeout: 0
      region: eu-west-1
      sse_encryption: true
compactor:
  retention_enabled: false
  shared_store: s3
distributor:
  ring:
    kvstore:
      store: memberlist
frontend:
  compress_responses: true
  log_queries_longer_than: 5s
  tail_proxy_url: http://loki-loki-distributed-querier:3100
frontend_worker:
  frontend_address: loki-loki-distributed-query-frontend:9095
  parallelism: 25
ingester:
  chunk_block_size: 262144
  chunk_encoding: snappy
  chunk_idle_period: 30m
  chunk_retain_period: 0s
  lifecycler:
    ring:
      kvstore:
        store: memberlist
      replication_factor: 1
  max_transfer_retries: 0
  query_store_max_look_back_period: 0
  wal:
    dir: /var/loki/wal
limits_config:
  enforce_metric_name: false
  ingestion_rate_mb: 75
  max_cache_freshness_per_query: 10m
  max_query_length: 0
  max_query_lookback: 0
  per_stream_rate_limit: 25MB
  query_timeout: 15m
  reject_old_samples: false
  reject_old_samples_max_age: 168h
  split_queries_by_interval: 24h
memberlist:
  join_members:
  - loki-loki-distributed-memberlist
querier:
  engine:
    timeout: 15m
  max_concurrent: 25
  multi_tenant_queries_enabled: true
query_range:
  align_queries_with_step: true
  cache_results: true
  max_retries: 5
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        ttl: 24h
ruler:
  alertmanager_url: https://alertmanager.xx
  external_url: https://alertmanager.xx
  ring:
    kvstore:
      store: memberlist
  rule_path: /tmp/loki/scratch
  storage:
    local:
      directory: /etc/loki/rules
    type: local
runtime_config:
  file: /var/loki-distributed-runtime/runtime.yaml
schema_config:
  configs:
  - from: "1970-01-01"
    index:
      period: 24h
      prefix: index_
    object_store: s3
    schema: v12
    store: boltdb-shipper
server:
  grpc_server_max_recv_msg_size: 20388078
  grpc_server_max_send_msg_size: 20388078
  http_listen_port: 3100
  http_server_read_timeout: 15m
  http_server_write_timeout: 15m
storage_config:
  boltdb_shipper:
    active_index_directory: /var/loki/boltdb-shipper-active
    build_per_tenant_index: true
    cache_location: /var/loki/boltdb-shipper-cache
    cache_ttl: 24h
    index_gateway_client:
      server_address: dns:///loki-loki-distributed-index-gateway:9095
    resync_interval: 5s
    shared_store: s3
  filesystem:
    directory: /var/loki/chunks
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

Thank you very much @tonyswumac .

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.