Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] node stalls syncing Error: Err.UNKNOWN_UNSPENT #17797

Open
epudwjuhc opened this issue Mar 27, 2024 · 18 comments
Open

[Bug] node stalls syncing Error: Err.UNKNOWN_UNSPENT #17797

epudwjuhc opened this issue Mar 27, 2024 · 18 comments
Labels
bug Something isn't working

Comments

@epudwjuhc
Copy link

What happened?

Node doesn't sync. When I replace the database with known-good data base it starts syncing some 200-3000 blocks but then fails again:

2024-03-27T17:26:41.240 full_node chia.full_node.full_node: ERROR Error: Err.UNKNOWN_UNSPENT, Invalid block from peer: PeerInfo(_ip=IPv4Address('58.183.125.25'), _port=8444)
2024-03-27T17:26:41.439 full_node full_node_server : WARNING Banning 58.183.125.25 for 600 seconds
2024-03-27T17:26:41.442 full_node chia.full_node.full_node: ERROR sync from fork point failed err: Failed to validate block batch 5134792 to 5134824
2024-03-27T17:27:58.593 full_node chia.consensus.block_body_validation: ERROR Err.UNKNOWN_UNSPENT: COIN ID: 0cc7d7663bf47dd61c280e178ab7ae9af068b2b51ab536ecf6d328942623487c NPC RESULT

Happens with latest, 2.2.1 and 2.0.1 versions, tested them all.

Version

2.0.1

What platform are you using?

Linux

What ui mode are you using?

CLI

Relevant log output

2024-03-27T17:26:41.240 full_node chia.full_node.full_node: ERROR    Error: Err.UNKNOWN_UNSPENT, Invalid block from peer: PeerInfo(_ip=IPv4Address('58.183.125.25'), _port=8444)
2024-03-27T17:26:41.439 full_node full_node_server        : WARNING  Banning 58.183.125.25 for 600 seconds
2024-03-27T17:26:41.442 full_node chia.full_node.full_node: ERROR    sync from fork point failed err: Failed to validate block batch 5134792 to 5134824
2024-03-27T17:27:58.593 full_node chia.consensus.block_body_validation: ERROR    Err.UNKNOWN_UNSPENT: COIN ID: 0cc7d7663bf47dd61c280e178ab7ae9af068b2b51ab536ecf6d328942623487c NPC RESULT
@epudwjuhc epudwjuhc added the bug Something isn't working label Mar 27, 2024
@epudwjuhc
Copy link
Author

It starts to sync from good database backup, then screws up:

while true; do sleep 10; chia show -s | grep Status ; done

Current Blockchain Status: Syncing 5134139/5134139 (0 behind).
Current Blockchain Status: Syncing 5134139/5134139 (0 behind).
Current Blockchain Status: Syncing 5134139/5138812 (4673 behind).
Current Blockchain Status: Syncing 5134139/5138812 (4673 behind).
Current Blockchain Status: Syncing 5134139/5138812 (4673 behind).
Current Blockchain Status: Syncing 5134139/5138812 (4673 behind).
Current Blockchain Status: Syncing 5134197/5138812 (4615 behind).
Current Blockchain Status: Syncing 5134281/5138812 (4531 behind).
Current Blockchain Status: Syncing 5134379/5138812 (4433 behind).
Current Blockchain Status: Syncing 5134465/5138812 (4347 behind).
Current Blockchain Status: Syncing 5134554/5138812 (4258 behind).
Current Blockchain Status: Syncing 5134652/5138812 (4160 behind).
Current Blockchain Status: Syncing 5134746/5138812 (4066 behind).
Current Blockchain Status: Syncing 5134792/5138819 (4027 behind).
Current Blockchain Status: Syncing 5134792/5138819 (4027 behind).
Current Blockchain Status: Syncing 5134792/5138819 (4027 behind).
Current Blockchain Status: Syncing 5134792/5138819 (4027 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138824 (4032 behind).
Current Blockchain Status: Syncing 5134792/5138824 (4032 behind).
Current Blockchain Status: Syncing 5134792/5138824 (4032 behind).
Current Blockchain Status: Syncing 5134792/5138824 (4032 behind).
Current Blockchain Status: Syncing 5134792/5138824 (4032 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138831 (4039 behind).
Current Blockchain Status: Syncing 5134792/5138831 (4039 behind).
Current Blockchain Status: Syncing 5134792/5138831 (4039 behind).
Current Blockchain Status: Syncing 5134792/5138831 (4039 behind).
Current Blockchain Status: Syncing 5134792/5138831 (4039 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Syncing 5134792/5138834 (4042 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138841 (4049 behind).
Current Blockchain Status: Syncing 5134792/5138841 (4049 behind).
Current Blockchain Status: Syncing 5134792/5138841 (4049 behind).
Current Blockchain Status: Syncing 5134792/5138841 (4049 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Not Synced. Peak height: 5134792
Exception ignored in: <coroutine object FullNode.sync_from_fork_point..fetch_block_batches at 0x1521b3c56440>
Traceback (most recent call last):
File "/nvm1/chia/src/chia-blockchain/chia/util/struct_stream.py", line 69, in init
super().init()
RuntimeError: coroutine ignored GeneratorExit
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Syncing 5134792/5138845 (4053 behind).
Current Blockchain Status: Syncing 5134792/5138848 (4056 behind).
Exception ignored in: <coroutine object FullNode.sync_from_fork_point..fetch_block_batches at 0x1521a73157c0>
Traceback (most recent call last):
File "/nvm1/chia/src/chia-blockchain/chia/util/streamable.py", line 366, in parse_rust
buf = f.getbuffer()
RuntimeError: coroutine ignored GeneratorExit
Current Blockchain Status: Syncing 5134792/5138848 (4056 behind).
Current Blockchain Status: Syncing 5134792/5138848 (4056 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138852 (4060 behind).
Current Blockchain Status: Syncing 5134792/5138852 (4060 behind).
Current Blockchain Status: Syncing 5134792/5138852 (4060 behind).
Current Blockchain Status: Syncing 5134792/5138852 (4060 behind).
Current Blockchain Status: Syncing 5134792/5138852 (4060 behind).
Current Blockchain Status: Syncing 5134792/5138852 (4060 behind).
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138857 (4065 behind).
Current Blockchain Status: Syncing 5134792/5138857 (4065 behind).
Current Blockchain Status: Syncing 5134792/5138857 (4065 behind).
Current Blockchain Status: Syncing 5134792/5138857 (4065 behind).
Current Blockchain Status: Syncing 5134792/5138857 (4065 behind).
Current Blockchain Status: Syncing 5134792/5138857 (4065 behind).
Exception ignored in: <coroutine object FullNode.sync_from_fork_point..fetch_block_batches at 0x1521b1fadcc0>
Traceback (most recent call last):
File "/nvm1/chia/src/chia-blockchain/chia/util/byte_types.py", line 35, in init
super().init()
RuntimeError: coroutine ignored GeneratorExit
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Not Synced. Peak height: 5134792
Current Blockchain Status: Syncing 5134792/5138859 (4067 behind).

@wjblanke
Copy link
Contributor

wjblanke commented Apr 3, 2024

We are thinking that somehow your coin_record table got corrupted and lost a coin. Best bet would be to get a new database by downloading from torrent or syncing from 0.

Its failing because 5134793 needs this coin to be unspent. That's why you can't sync past it

https://mojonode.com/explorer?q=select%20*%20from%20coin_records%20where%20name%20=%20%270x0cc7d7663bf47dd61c280e178ab7ae9af068b2b51ab536ecf6d328942623487c%27

@epudwjuhc
Copy link
Author

We are thinking that somehow your coin_record table got corrupted and lost a coin. Best bet would be to get a new database by downloading from torrent or syncing from 0.

Its failing because 5134793 needs this coin to be unspent. That's why you can't sync past it

https://mojonode.com/explorer?q=select%20*%20from%20coin_records%20where%20name%20=%20%270x0cc7d7663bf47dd61c280e178ab7ae9af068b2b51ab536ecf6d328942623487c%27

I have no clue how this could have happened. It was always syncing.

Getting complete database is unacceptable. If such issues are not fixed, the project will loose participants - every time a bug corrupts the db pull 180GB of data? This is not serious.

There must be a mechanism implemented either to fix broken databases, or blockchain pruning must be implemented. Think about it: at which database size the project gets unmanageable for most users? 100GB, 1TB, 10TB?

Can the db be fixed by e.g. vacuuming it only up to certain sequence number (before the corruption point)?

@hbroer
Copy link

hbroer commented Apr 4, 2024

Don't you run another node? You can simply copy the db from one machine to another. You can also do backups from time to time. Data corruption can always happen, especially on consumer hardware.

@epudwjuhc
Copy link
Author

Nope, I don't run another. It is exactly not so "simple copy it from another machine". Such projects are started without ahead planing to my eye. Oh just use Sqlite. If it had to be real block chain then it would be possible to just dd it and discard the corrupted tail or so (like mpeg for instance).

The db size makes it unmanageable long term for most of the users if it keeps corrupting. So many users complaining about chia not syncing if you google it. Maybe it is time to think about real solution to this and not just have to start from 0, eh?

What about some DB validation tool that can revive a not syncing db? Can't be that difficult to write if you know the database format - I would say. I didn't look into details.

ps upgraded to latest and it is not even trying to sync (never showing a syncing attempt)

@epudwjuhc
Copy link
Author

can you write a SQL script that will discard last X blocks from the DB ? are there any sequence numbers...

@epudwjuhc
Copy link
Author

e.g. lets take a simple example of 3 blocks:

block1 [ coin1 coin2 coin 3 ]
block2 [ coin1 coin2->spent coin3->spent ]
block3 [ other stuff ... ]

lets assume (without me knowing details about chia format) that block1 is only there due to coin1 not having another future transaction, however otherwise it is completely useless. Would it be possible to implement it in the protocol to automatically send coin1 to the same address it belongs it and then discard block1 - as no further future transaction will be ever able to reference it, since all coins in that block were moved into further on transactions.

If Im not mistaken Nexellia is using something similar to keep the chain size reasonably small. Solutions DO exits.

I'm sure you can find a solution.

@wjblanke
Copy link
Contributor

The fastest workaround is to download the DB from a torrent. Some error or edge case or corruption may have lost your coin. Unfortunately we haven't seen this issue from others so it may be impossible to reproduce. If you are able to reproduce it reliably, please let us know. Any issue we can reproduce should be fixable.

@esaung
Copy link

esaung commented Apr 10, 2024

I ran into this as well on my full node today:
2024-04-10T11:53:05.838 full_node chia.consensus.block_body_validation: ERROR Err.UNKNOWN_UNSPENT: COIN ID: 18c739bd4ef06e6059216141d8596c573d588dcda7fe67383e1abe25d1463561 fork_info: ForkInfo(fork_height=5201538, peak_height=5201538, peak_hash=b"p\xff'\xf3X\x15Oy\xa7$\xe9|=\x1fV\xcf{\x83\xda\x97\xbc\xc2\x8e\xc9\xab\xd2\xc0\xf0\xde\xf4K\x83", additions_since_fork={}, removals_since_fork={}, block_hashes=[])
2024-04-10T11:53:05.838 full_node chia.full_node.full_node: ERROR Error: Err.UNKNOWN_UNSPENT, Invalid block from peer: PeerInfo(_ip=IPv4Address('189.179.193.122'), _port=8444)
2024-04-10T11:53:05.838 full_node full_node_server : WARNING Banning 189.179.193.122 for 600 seconds
2024-04-10T11:53:05.838 full_node full_node_server : INFO Connection closed: 189.179.193.122, node id: acb517474c1fd148f245ff7f8838f7aa30d2afdd168c112db9255284806330b7
2024-04-10T11:53:05.838 full_node chia.full_node.full_node: INFO peer disconnected PeerInfo(_ip=IPv4Address('189.179.193.122'), _port=8444)
2024-04-10T11:53:05.854 full_node chia.full_node.full_node: ERROR sync from fork point failed: ValueError: Failed to validate block batch 5201538 to 5201569
Traceback (most recent call last):
File "chia\util\log_exceptions.py", line 20, in log_exceptions
File "chia\full_node\full_node.py", line 1176, in sync_from_fork_point
File "chia\full_node\full_node.py", line 1150, in validate_block_batches
ValueError: Failed to validate block batch 5201538 to 5201569

@epudwjuhc
Copy link
Author

I ran into this as well on my full node today: 2024-04-10T11:53:05.838 full_node chia.consensus.block_body_validation: ERROR Err.UNKNOWN_UNSPENT: COIN ID:

=> to me it looks like first update to 1.8.3 then 2.0.0 crippled my database. It was syncing before I decided to upgrade to v2 format.

@epudwjuhc
Copy link
Author

ps I was running 1.6.x for a while before I did that upgrade

@wjblanke
Copy link
Contributor

Do you still have the v1 DB? Is this reproducible when you convert the DB from v1 to v2? If so that is really interesting and we should be able to fix the issue. A coin could be getting dropped during the migration.

@wjblanke
Copy link
Contributor

wjblanke commented Apr 17, 2024

eming, can u ask your node for additions for block 1729601?

chia rpc full_node get_additions_and_removals '{"header_hash": "0x5d5fcb0e572182a82d9e8ea85314894f416288b96f22a2b84f9478994dc484ae"}'

Lets see if it finds

{
"coin": {
"amount": 2,
"parent_coin_info": "0xa530d92c501aba1faa2287f37aa997b202cb5a85d909ab96bee3e6a311df62e8",
"puzzle_hash": "0x4bc6435b409bcbabe53870dae0f03755f6aabb4594c5915ec983acf12a5d1fba"
},
"coinbase": false,
"confirmed_block_index": 1729601,
"spent": true,
"spent_block_index": 5201539,
"timestamp": 1647942588
},

aka

18c739bd4ef06e6059216141d8596c573d588dcda7fe67383e1abe25d1463561

https://alltheblocks.net/chia/coin/18c739bd4ef06e6059216141d8596c573d588dcda7fe67383e1abe25d1463561

We are trying to determine if u have the block correctly but it just wasn't putting the coin into coin_records

@epudwjuhc
Copy link
Author

it prints:

chia rpc full_node get_additions_and_removals '{"header_hash": "0x5d5fcb0e572182a82d9e8ea85314894f416288b96f22a2b84f9478994dc484ae"}'
Request failed: RPC response failure: {"error": "Block 5d5fcb0e572182a82d9e8ea85314894f416288b96f22a2b84f9478994dc484ae not found", "success": false, "traceback": "Traceback (most recent call last):\n File "/nvm1/chia/src/chia-blockchain/chia/rpc/util.py", line 49, in inner\n res_object = await f(request_data)\n File "/nvm1/chia/src/chia-blockchain/chia/rpc/full_node_rpc_api.py", line 790, in get_additions_and_removals\n raise ValueError(f"Block {header_hash.hex()} not found")\nValueError: Block 5d5fcb0e572182a82d9e8ea85314894f416288b96f22a2b84f9478994dc484ae not found\n"}

@epudwjuhc
Copy link
Author

Do you still have the v1 DB? Is this reproducible when you convert the DB from v1 to v2? If so that is really interesting and we should be able to fix the issue. A coin could be getting dropped during the migration.

Im afraid I deleted the v1 one. Maybe the v1 got corrupted when chia shut down and then I migrated an improperly shut down database? Had to vaccuum it first maybe?

Anyways this happened to me several times in the past (when the DB was far smaller, now Im unable to start from 0). Should be addressed ASAP to have a journaled transactions or so, it just can't be that so much data gets corrupted because e.g. of power outage and the pc goes down.

@wjblanke
Copy link
Contributor

We use WAL for sqlite. But that doesn't mean that a drive can't corrupt things. I'm a bit concerned that u can't get information for block

https://alltheblocks.net/chia/block/5d5fcb0e572182a82d9e8ea85314894f416288b96f22a2b84f9478994dc484ae

as that is at a million or so and you are at 5 million. Also that you say you have had problems like this in the past. I am wondering if the drive hosting your chia database is having issues or maybe the controller or connection.

@epudwjuhc
Copy link
Author

So, you are saying the DB was corrupted at an earlier stage but it just manifests right now because someone moved that coin from the past, is that correct?

In that case I wonder why the protocol is so inconsistent, there should be a consistent stream of blocks. I didn't look into details but it stills looks to me like giving too much control to SQLite may be one of a design errors here. In a real block chain one should be able to just "chop it off" at block X and start syncing from there. But it looks more like a SQL blob

@wjblanke
Copy link
Contributor

wjblanke commented May 1, 2024

There are tables for the coin store and when reorgs happen these need to get processed to account for changes in the blockchain, so the DB is constantly changing and unfortunately can't get chopped off. We are wondering if there are issues with these changes being made because of drive problems, only because you've had multiple issues in the past. WAL is supposed to prevent corruption due to power outages etc, but still assumes a working disk. We think this is a DB issue, if you replace it things should get working again. Apologies for the problems. Please let us know if you continue to have issues especially if it involves a different drive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants