Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irmin-pack: error unknown inode key #2276

Open
metanivek opened this issue Sep 29, 2023 · 0 comments
Open

irmin-pack: error unknown inode key #2276

metanivek opened this issue Sep 29, 2023 · 0 comments
Labels
tezos-support Support for bugs related to Tezos

Comments

@metanivek
Copy link
Member

Tezos is experiencing a periodic issue on their testnet (has not been observed on mainnet):

Sep 25 20:38:10.105:   Error:
Sep 25 20:38:10.105:     {"Direct":["CoUnviMUq75Qy4i43Y18zSYf9TTTrCyrMHRyif4hWQLwA3EjD5kz",22250291624,360]}: unknown inode key (find_value)
Sep 25 20:38:10.105: 
Sep 25 20:38:10.106: worker crashed [validator-peer]:
Sep 25 20:38:10.106:   Error:
Sep 25 20:38:10.106:     {"Direct":["CoUnviMUq75Qy4i43Y18zSYf9TTTrCyrMHRyif4hWQLwA3EjD5kz",22250291624,360]}: unknown inode key (find_value)
Sep 25 20:38:10.106: 
Sep 25 20:38:10.120: fetching branch of 3 blocks from peer idtw8Zm1g1bUJEPyqDP6tC8EvjFv4u
Sep 25 20:38:10.120: validation of block BLEzAQFH6kzPu3Q9wKpCK68UscAhfMQwGZHLrK7e3NSuzdwaHDr
Sep 25 20:38:10.120:   failed but precheck succeeded,
Sep 25 20:38:10.120:   Request pushed on 2023-09-25T20:38:10.120-00:00, treated in 4.988us, completed in 77.456us:
Sep 25 20:38:10.120:   {"Direct":["CoUnviMUq75Qy4i43Y18zSYf9TTTrCyrMHRyif4hWQLwA3EjD5kz",22250291624,360]}: unknown inode key (find_value)
Sep 25 20:38:10.120: unexpected error while fetching headers:
Sep 25 20:38:10.120:   {"Direct":["CoUnviMUq75Qy4i43Y18zSYf9TTTrCyrMHRyif4hWQLwA3EjD5kz",22250291624,360]}: unknown inode key (find_value)
Sep 25 20:38:10.122: request
Sep 25 20:38:10.122:   New branch BLm8Dx1TwWBN7DA2YfiULePTR7z5BaVAFDnvcqvLshe3GySoAeZ, locator length 423475
Sep 25 20:38:10.122:   failed
Sep 25 20:38:10.122:   (Request pushed on 2023-09-25T20:38:10.119-00:00, treated in 253us, completed in 2.519ms):

We do not yet have reproduction steps, but here is what we know:

  1. Has only been observed on testnet
  2. Happens roughly at the same time across nodes that are run by the foundation
  3. These nodes regularly export snapshots for network backup
  4. It is unclear whether the error is coming from the read-write instance or the read-only instance
  5. Running irmin 3.7.2

When this occurred earlier this year, we guessed it might be related to GC and read-only instances not properly clearing the LRU. This was fixed in #2200.

We think there could be a bug in reading data that hasn't been flushed to disk since the append-only file does not allow reading beyond the last flushed offset, but this should never happen given higher level code in pack_store prevents this with the staging hash table (although this could use some inspection). Having a control file for the node at the time of this error would help us disprove this idea since we could compare the last written offset in the control file to the offset in the key that is logged.

@metanivek metanivek added the tezos-support Support for bugs related to Tezos label Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tezos-support Support for bugs related to Tezos
Projects
None yet
Development

No branches or pull requests

1 participant