Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki v3.0.0 brokes storage after restart #12811

Open
catap opened this issue Apr 28, 2024 · 2 comments
Open

Loki v3.0.0 brokes storage after restart #12811

catap opened this issue Apr 28, 2024 · 2 comments
Labels
3.0 type/bug Somehing is not working as expected

Comments

@catap
Copy link

catap commented Apr 28, 2024

Describe the bug
I've restarted Loki v3.0.0 via rcctl restart loki and it doesn't start.

It had incoming logs, but a flow was a few hundred messages per minute. So, nothin.

To Reproduce
Steps to reproduce the behavior:

  1. Started Loki v3.0.0
  2. Wait some time to start writing incomming logs
  3. Restart it.

It won't start with error:

ts=2024-04-28T21:12:06.334387782Z caller=spanlogger.go:109 level=info msg="building table names cache"
ts=2024-04-28T21:12:06.334394625Z caller=spanlogger.go:109 level=info msg="table names cache built" duration=4.248µs
level=info ts=2024-04-28T21:12:06.334403231Z caller=table_manager.go:273 index-store=tsdb-2020-10-24 msg="query readiness setup completed" duration=862ns distinct_users_len=0 distinct_users=
level=info ts=2024-04-28T21:12:06.334412409Z caller=shipper.go:160 index-store=tsdb-2020-10-24 msg="starting index shipper in RW mode"
level=info ts=2024-04-28T21:12:06.335066695Z caller=head_manager.go:308 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="loaded wals by period" groups=0
level=info ts=2024-04-28T21:12:06.335104356Z caller=manager.go:86 index-store=tsdb-2020-10-24 component=tsdb-manager msg="loaded leftover local indices" err=null successful=true buckets=0 indices=0 failures=0
level=info ts=2024-04-28T21:12:06.335123853Z caller=head_manager.go:308 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="loaded wals by period" groups=2
level=error ts=2024-04-28T21:12:06.339493286Z caller=log.go:216 msg="error running loki" err="invalid checksum\nread symbols\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).finishSymbols\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:710\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).ensureStage\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:398\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).AddSeries\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:457\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*Builder).Build\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/builder.go:167\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*tsdbManager).buildFromHead\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/manager.go:221\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*tsdbManager).BuildFromWALs\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shi

Expected behavior
It should restart!

Environment:

  • Infrastructure: OpenBSD 7.5 bare metal
  • Deployment tool: installed from ports

Screenshots, Promtail config, or terminal output
Config:

auth_enabled: false

server:
  log_level: info

common:
  instance_addr: 127.0.0.1
  path_prefix: /var/loki
  storage:
    filesystem:
      chunks_directory: /var/loki/chunks
      rules_directory: /var/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

query_range:
  results_cache:
    cache:
      embedded_cache:
        enabled: true
        max_size_mb: 100

schema_config:
  configs:
    - from: 2020-10-24
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

run under ktrace allows to ping point the file with issue:

 97609 loki     CALL  open(0xc0000c0180,0x10000<O_RDONLY|O_CLOEXEC>)
 97609 loki     NAMI  "/var/loki/tsdb-shipper-active/scratch/filesystem_2020-10-24/index-52bdedf2c48a674c.staging"
 97609 loki     RET   open 11/0xb
 97609 loki     CALL  fstat(11,0xc00088fa58)
 97609 loki     STRU  struct stat { dev=1068, ino=156790377, mode=-rw-r--r-- , nlink=1, uid=848<"_loki">, gid=0<"wheel">, rdev=627163797, atime=1714338762<"
Apr 28 23:12:42 2024">.629669528, mtime=1714338762<"Apr 28 23:12:42 2024">.639668584, ctime=1714338762<"Apr 28 23:12:42 2024">.639668584, size=1968, blocks=
4, blksize=16384, flags=0x0, gen=0x0 }
 97609 loki     RET   fstat 0
 97609 loki     CALL  fstat(11,0xc000546278)
 97609 loki     STRU  struct stat { dev=1068, ino=156790377, mode=-rw-r--r-- , nlink=1, uid=848<"_loki">, gid=0<"wheel">, rdev=627163797, atime=1714338762<"
Apr 28 23:12:42 2024">.629669528, mtime=1714338762<"Apr 28 23:12:42 2024">.639668584, ctime=1714338762<"Apr 28 23:12:42 2024">.639668584, size=1968, blocks=
4, blksize=16384, flags=0x0, gen=0x0 }
 97609 loki     RET   fstat 0
 97609 loki     CALL  mmap(0,0x7b0,0x1<PROT_READ>,0x1<MAP_SHARED>,11,0)
 97609 loki     RET   mmap 8752971776/0x209b7c000
 97609 loki     CALL  pwrite(12,0xc002796000,0x4,0x7ac)
 97609 loki     GIO   fd 12 wrote 4 bytes
       "\M-]\^W\^_d"
 97609 loki     RET   pwrite 4
 97609 loki     CALL  write(2,0xc0008a2500,0x2056)
 97609 loki     GIO   fd 2 wrote 4096 bytes
       "invalid checksum
        read symbols
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).finishSymbols
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:710
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).ensureStage
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:398
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).AddSeries
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:457
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*Builder).Build
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/builder.go:167
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*tsdbManager).buildFromHead
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/manager.go:221
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*tsdbManager).BuildFromWALs
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/manager.go:294
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*HeadManager).buildTSDBFromWALs
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/head_manager.go:317
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*HeadManager).Start
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/head_manager.go:283
        github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*store).init
                /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/store.go:123

if I remove it, it creates a brand new and fails the same way

Anyway, removing / moving wal via doas mv /var/loki/tsdb-shipper-active/wal /var/loki/tsdb-shipper-active/wal_ allows to start it.

Anyway, logs for this day aren't available. Thus, futre restart leads to the same issue:

Apr 28 23:24:30 island loki[25756]: level=info ts=2024-04-28T21:24:30.139728346Z caller=module_service.go:120 msg="module stopped" module=ring
Apr 28 23:24:30 island loki[25756]: level=info ts=2024-04-28T21:24:30.139768192Z caller=module_service.go:120 msg="module stopped" module=memberlist-kv
Apr 28 23:24:30 island loki[25756]: level=info ts=2024-04-28T21:24:30.140544548Z caller=modules.go:1770 msg="server stopped"
Apr 28 23:24:30 island loki[25756]: level=info ts=2024-04-28T21:24:30.140557362Z caller=module_service.go:120 msg="module stopped" module=server
Apr 28 23:24:30 island loki[25756]: level=info ts=2024-04-28T21:24:30.140570517Z caller=loki.go:503 msg="Loki stopped" running_time=2m47.575854342s
Apr 28 23:24:49 island loki[60826]: invalid checksum
Apr 28 23:24:49 island loki[60826]: read symbols
Apr 28 23:24:49 island loki[60826]: github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).finishSymbols
Apr 28 23:24:49 island loki[60826]:     /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.g
o:710
Apr 28 23:24:49 island loki[60826]: github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).ensureStage
Apr 28 23:24:49 island loki[60826]:     /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.g
o:398
Apr 28 23:24:49 island loki[60826]: github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).AddSeries
Apr 28 23:24:49 island loki[60826]:     /usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.g
o:457
...
Apr 28 23:24:49 island loki[60826]: runtime.main
Apr 28 23:24:49 island loki[60826]:     /usr/local/go/src/runtime/proc.go:271
Apr 28 23:24:49 island loki[60826]: runtime.goexit
Apr 28 23:24:49 island loki[60826]:     /usr/local/go/src/runtime/asm_amd64.s:1695
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.969236222Z caller=main.go:120 msg="Starting Loki" version="(version=, branch=, revision=unknown)"
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.970291516Z caller=server.go:354 msg="server listening on addresses" http=0.0.0.0:3100 grpc=0.0.0.0:9095
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.972956059Z caller=table_manager.go:136 index-store=tsdb-2020-10-24 msg="uploading tables"
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.973112113Z caller=table_manager.go:429 index-store=tsdb-2020-10-24 msg="loading local table index_19839"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973263609Z caller=spanlogger.go:109 level=info msg="building table names cache"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973403774Z caller=spanlogger.go:109 level=info msg="table names cache built" duration=135.005M-BM-5s
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973420255Z caller=spanlogger.go:109 level=info msg="building table names cache"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973464128Z caller=spanlogger.go:109 level=info msg="table names cache built" duration=40.556M-BM-5s
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973505396Z caller=spanlogger.go:109 level=info msg="building table names cache"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973546433Z caller=spanlogger.go:109 level=info msg="table names cache built" duration=38.112M-BM-5s
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.973559759Z caller=table_manager.go:429 index-store=tsdb-2020-10-24 msg="loading local table index_19840"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973604243Z caller=spanlogger.go:109 level=info msg="building table names cache"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973646573Z caller=spanlogger.go:109 level=info msg="table names cache built" duration=39.425M-BM-5s
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973674315Z caller=spanlogger.go:109 level=info msg="building table names cache"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973714992Z caller=spanlogger.go:109 level=info msg="table names cache built" duration=37.701M-BM-5s
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.97372471Z caller=table_manager.go:429 index-store=tsdb-2020-10-24 msg="loading local table index_19841"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973766189Z caller=spanlogger.go:109 level=info msg="building table names cache"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973808719Z caller=spanlogger.go:109 level=info msg="table names cache built" duration=38.172M-BM-5s
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973836922Z caller=spanlogger.go:109 level=info msg="building table names cache"
Apr 28 23:24:49 island loki[60826]: ts=2024-04-28T21:24:49.973877649Z caller=spanlogger.go:109 level=info msg="table names cache built" duration=37.851M-BM-5s
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.973889081Z caller=table_manager.go:273 index-store=tsdb-2020-10-24 msg="query readiness setup completed" duration=1.293M-BM-5s distinct_users_len=0 distinct_users=
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.973899861Z caller=shipper.go:160 index-store=tsdb-2020-10-24 msg="starting index shipper in RW mode"
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.974577291Z caller=head_manager.go:308 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="loaded wals by period" groups=0
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.974618458Z caller=manager.go:86 index-store=tsdb-2020-10-24 component=tsdb-manager msg="loaded leftover local indices" err=null successful=true buckets=0 indices=0 failures=0
Apr 28 23:24:49 island loki[60826]: level=info ts=2024-04-28T21:24:49.974637504Z caller=head_manager.go:308 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="loaded wals by period" groups=1
Apr 28 23:24:49 island loki[60826]: level=error ts=2024-04-28T21:24:49.984195127Z caller=log.go:216 msg="error running loki" err="invalid checksum\nread symbols\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).finishSymbols\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:710\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).ensureStage\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:398\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index.(*Writer).AddSeries\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/index/index.go:457\ngithub.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb.(*Builder).Build\n\t/usr/ports/pobj/loki-3.0.0/go/src/github.com/grafana/loki/pkg/storage/stores/shipper/indexshipper/tsdb/builder.go:167\ngithub.com/grafana/l
Apr 28 23:24:49 island loki[60826]: oki/pkg/storage/stores/shipper/indexshipper/tsdb.(*tsdbManager).buildFromHead\n\t/usr/ports/pobj/loki-3.0.0/go/src/githu
...
@catap
Copy link
Author

catap commented Apr 29, 2024

Well, it works but I can't get any logs from it and it produces same errors on runtime:

island$ zgrep -h level=error /var/log/daemon.4.gz | head                                                                                                   
Apr 29 08:00:42 island loki[76313]: level=error ts=2024-04-29T06:00:42.162944491Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:01:42 island loki[76313]: level=error ts=2024-04-29T06:01:42.161902872Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:02:42 island loki[76313]: level=error ts=2024-04-29T06:02:42.161864577Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:03:42 island loki[76313]: level=error ts=2024-04-29T06:03:42.163866697Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:04:42 island loki[76313]: level=error ts=2024-04-29T06:04:42.161827423Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:05:42 island loki[76313]: level=error ts=2024-04-29T06:05:42.162596935Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:06:42 island loki[76313]: level=error ts=2024-04-29T06:06:42.163367638Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:07:42 island loki[76313]: level=error ts=2024-04-29T06:07:42.162041682Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:08:42 island loki[76313]: level=error ts=2024-04-29T06:08:42.161788402Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 08:09:42 island loki[76313]: level=error ts=2024-04-29T06:09:42.164553823Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
island$ zgrep -h level=error /var/log/daemon | tail                                                                                                        
Apr 29 13:00:42 island loki[76313]: level=error ts=2024-04-29T11:00:42.170211154Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 13:01:42 island loki[76313]: level=error ts=2024-04-29T11:01:42.161846694Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 13:02:42 island loki[76313]: level=error ts=2024-04-29T11:02:42.162006924Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 13:03:42 island loki[76313]: level=error ts=2024-04-29T11:03:42.162478914Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
Apr 29 13:04:42 island loki[76313]: level=error ts=2024-04-29T11:04:42.164361007Z caller=head_manager.go:173 index-store=tsdb-2020-10-24 component=tsdb-head-manager msg="failed building tsdb head" period=1904825 err="building tsdb from head: read symbols: invalid checksum"
island$ 

@catap
Copy link
Author

catap commented Apr 29, 2024

Seems to be similar with #8564 and #8866

@JStickler JStickler added 3.0 type/bug Somehing is not working as expected labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.0 type/bug Somehing is not working as expected
Projects
None yet
Development

No branches or pull requests

2 participants