You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In v3.0.0, Loki will quietly drop some log lines in some cases when the query includes a line format.
To Reproduce
Steps to reproduce the behavior:
Started Loki v3.0.0, (and v2.7.5 for comparison)
Started Promtail v2.7.5
Ingested a significant number of logs
Make queries for a specified time range like the following '{image="standalone/llms:c38c6aa8f223b1aa86e1f0726362b15432c8697f", job="container_logs"} |= saved checkpoint at batch 75 | json | line_format .'
Make queries for the same time range like the following ''{image="standalone/llms:c38c6aa8f223b1aa86e1f0726362b15432c8697f", job="container_logs"} |= saved checkpoint at batch 75 | json'
Observe that for the exact same time range and logs ingested:
v2.7.5 with and without the line format give 1800 logs
v3.0.0 without the line format gives 1800 logs
v3.0.0 with the line format only gives 1793 logs
Expected behavior
Neither the version nor the line format pipeline directive should change the count of logs received.
Environment:
Infrastructure: bare metal
Deployment tool: Running the images directly in docker, starting them via ansible.
schema_config:
configs:
- from: 2023-08-01
store: boltdb-shipper
object_store: s3
schema: v11
index:
prefix: loki_
period: 24h
- from: 2024-04-23
store: tsdb
object_store: s3
schema: v13
index:
prefix: index_
period: 24h
common:
path_prefix: "/loki"
replication_factor: 1
ring:
instance_addr: 127.0.0.1
kvstore:
store: "inmemory"
instance_addr: 127.0.0.1
storage_config:
tsdb_shipper:
active_index_directory: "/loki/tsdb-index"
cache_location: "/loki/tsdb-cache"
cache_ttl: 168h
boltdb_shipper:
active_index_directory: "/loki/index"
cache_location: "/loki/boltdb-cache"
cache_ttl: 168h
aws:
s3: "s3://us-west-2/test-int8-logs"
# TODO: we will want to replace these with a dedicated set at some point
access_key_id: {{ lookup('ansible.builtin.env', 'AWS_ACCESS_KEY_ID', default=Undefined) }}
secret_access_key: {{ lookup('ansible.builtin.env', 'AWS_SECRET_ACCESS_KEY', default=Undefined) }}
index_queries_cache_config: &general_cache_config
default_validity: 168h
embedded_cache: &general_embedded_cache_config
enabled: true
max_size_mb: 4096 # 4gb should be plenty with our 2.5g/logs per day and still manageable, even duplicated 5 places.
ttl: 168h
# Here, Auth refers to specifically the X-Scope-OrgID header, which we don't use.
auth_enabled: false
server:
http_tls_config:
cert_file: "/loki/certs/server.crt"
key_file: "/loki/certs/server.key"
client_ca_file: "/loki/certs/root.crt"
# For debugging, we can set this to "VerifyClientCertIfGiven" temporarily to get at the metrics and api in your browser.
client_auth_type: "VerifyClientCertIfGiven"
grpc_server_max_recv_msg_size: 67108864 # 64mb, default 4mb
grpc_server_max_send_msg_size: 67108864
http_server_read_timeout: 2m
http_server_write_timeout: 2m
log_level: debug
memberlist:
node_name: singleton
limits_config:
# We're actually pretty good about being <1kb per log line, so this corresponds to 100mb, which is large but not massive.
# We specifically want a higher value than the default because grafana doesn't really do pagination, so this is the max that can be displayed there. Most queries should probably be limited to ~10k lines
max_entries_limit_per_query: 100000
# The default of 32 is quite low - Grafana serves queries for everyone
max_query_parallelism: 512
# Default of 1h shards things far too thin (4k requests for a single month long query). 6h is much more reasonable
split_queries_by_interval: 6h
# We hit the default of 10k streams per user :(
max_streams_per_user: 0
max_global_streams_per_user: 0
allow_structured_metadata: false
ingestion_rate_mb: 12
ingestion_burst_size_mb: 48
max_line_size: 16KB
max_line_size_truncate: true
frontend:
# Default of 100 is far too low
max_outstanding_per_tenant: 2048
log_queries_longer_than: 20s
frontend_worker:
grpc_client_config: &grpc_client_config
max_send_msg_size: 67108864 # 64mb, default 16mb
max_recv_msg_size: 67108864
querier:
# Default of 10 is quite low. We've got beefy machines
max_concurrent: 20
query_scheduler:
max_outstanding_requests_per_tenant: 10000
grpc_client_config: *grpc_client_config
query_range:
# https://github.com/grafana/loki/issues/4613#issuecomment-1021421653
parallelise_shardable_queries: false
align_queries_with_step: true
cache_results: true
results_cache:
cache: *general_cache_config
cache_instant_metric_results: true
instant_metric_query_split_align: true
instant_metric_results_cache:
cache: *general_cache_config
chunk_store_config:
chunk_cache_config:
default_validity: 168h
embedded_cache:
<<: *general_embedded_cache_config
max_size_mb: 40960 # pretty big 40gb so we can just keep a whole weeks worth of logs in memory. Why not?
write_dedupe_cache_config: *general_cache_config
# It's really unclear from docs what this number is actually supposed to be. Just the point at which we're confident logs have been ingested?
cache_lookups_older_than: 1h
Describe the bug
In v3.0.0, Loki will quietly drop some log lines in some cases when the query includes a line format.
To Reproduce
Steps to reproduce the behavior:
saved checkpoint at batch 75
| json | line_format.
'saved checkpoint at batch 75
| json'v2.7.5 with and without the line format give 1800 logs
v3.0.0 without the line format gives 1800 logs
v3.0.0 with the line format only gives 1793 logs
Expected behavior
Neither the version nor the line format pipeline directive should change the count of logs received.
Environment:
It's a single replica deployment.
Screenshots, Promtail config, or terminal output
I've posted screenshots and dumps of logs in the community slack: https://grafana.slack.com/archives/C06RUQJHTHQ/p1713891805440339
Here's the loki config we're using for v3.0.0:
Querying v3.0.0 without the line format gives 1800 lines:
https://files.slack.com/files-pri/T05675Y01-F06V8NGCVB9/download/screenshot_2024-04-23_at_12.03.56___pm.png?origin_team=T05675Y01
https://files.slack.com/files-pri/T05675Y01-F0706C4AS85/download/explore-v3.0.0-no-line-format.json?origin_team=T05675Y01
v3.0.0 with the line format gives 1793 lines:
https://grafana.slack.com/files/U06V4UJKA7N/F0706CB2A8M/screenshot_2024-04-23_at_12.05.42___pm.png?origin_team=T05675Y01&origin_channel=C05675Y4F
https://files.slack.com/files-pri/T05675Y01-F070Y3GHSDN/download/explore-v3.0.0-with-line-format.json?origin_team=T05675Y01
v2.7.5 without the line format is 1800 lines:
https://grafana.slack.com/files/U06V4UJKA7N/F0709A4M34J/screenshot_2024-04-23_at_12.06.52___pm.png?origin_team=T05675Y01&origin_channel=C05675Y4F
https://files.slack.com/files-pri/T05675Y01-F070Y3NP4U8/download/explore-v2.7.5-no-line-format.json?origin_team=T05675Y01
And v2.7.5 with the line format is 1800 lines:
https://grafana.slack.com/files/U06V4UJKA7N/F0706CTN38V/screenshot_2024-04-23_at_12.08.55___pm.png?origin_team=T05675Y01&origin_channel=C05675Y4F
https://files.slack.com/files-pri/T05675Y01-F070Y3ZBERW/download/explore-v2.7.5-with-line-format.json?origin_team=T05675Y01
To get good diffs you have to remove all the "service_name": lines fields and format the json. With that done:
The text was updated successfully, but these errors were encountered: