Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nimbus falling back to doppelgänger unexpectedly #5845

Open
stefa2k opened this issue Feb 2, 2024 · 2 comments
Open

Nimbus falling back to doppelgänger unexpectedly #5845

stefa2k opened this issue Feb 2, 2024 · 2 comments

Comments

@stefa2k
Copy link

stefa2k commented Feb 2, 2024

Describe the bug
Nimbus Validator Client running for 12 days unexpectedly fell back to doppelgänger mode. This incident occurred on January 24, 2024. The VC was configured with 4 beacon nodes, with 3 of them in sync and working properly. This behavior is not a regular occurrence, and it's the first time it has been observed despite long-term use of Nimbus with multiple instances.

Log:

{"log":"NTC 2024-01-24 11:02:55.132+00:00 Attestation published                      delay=4s132ms976us539ns service=attestation_service validator=a6d0ddc
d@552979 attestation=\"(aggregation_bits: 0b00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000, data: (slot: 8272512, index: 53, beacon_block_root: \\\"a790667a\\\", source: \\\"258515:f20a4e4e\\\", target: \\\"258516:a790667
a\\\"), signature: \\\"86048cbd\\\")\"\n","stream":"stdout","time":"2024-01-24T11:02:55.13309882Z"}
{"log":"NTC 2024-01-24 11:02:55.689+00:00 Aggregated attestation published           delay=689ms116us899ns service=attestation_service validator=80e5b8d1@554956 attestation=\"(aggregation_bits: 0b111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111101111111111111111111111111111111111111111111111111111111111111111111111111111111111111, data: (slot: 8272512, index: 11, beacon_block_root: \\\"a790667a\\\", source: \\\"258515:f20a4e4e\\\", target: \\\"258516:a790667a\\\"), signature: \\\"a6886b77\\\")\"\n","stream":"stdout","time":"2024-01-24T11:02:55.689385946Z"}
{"log":"INF 2024-01-24 11:02:59.021+00:00 Slot start                                 slot=8272513 epoch=258516 attestationIn=4s blockIn=\u003cunknown\u003e validators=500 good_nodes=3 viable_nodes=0 bad_nodes=1 delay=21ms321us760ns\n","stream":"stdout","time":"2024-01-24T11:02:59.022051125Z"}
{"log":"WRN 2024-01-24 11:02:59.024+00:00 Unable to publish sync committee messages and contributions in time slot=8272512 timeout=11s977ms979us74ns service=sync_committee_service\n","stream":"stdout","time":"2024-01-24T11:02:59.024589419Z"}
{"log":"NTC 2024-01-24 11:03:02.690+00:00 Beacon node is online                      agent_version=Lodestar/v1.14.0/5ac2fae node=http://x.x.x.x:9596[Lodestar/v1.14.0/5ac2fae] node_index=2 node_roles=AGBSDT\n","stream":"stdout","time":"2024-01-24T11:03:02.691318235Z"}
{"log":"NTC 2024-01-24 11:03:02.835+00:00 Beacon node is compatible                  node=http://x.x.x.x:9596[Lodestar/v1.14.0/5ac2fae] node_index=2 node_roles=AGBSDT\n","stream":"stdout","time":"2024-01-24T11:03:02.835785233Z"}
{"log":"NTC 2024-01-24 11:03:02.838+00:00 Beacon node is in sync                     head_slot=8272512 sync_distance=0 is_optimistic=false node=http://x.x.x.x:9596[Lodestar/v1.14.0/5ac2fae] node_index=2 node_roles=AGBSDT\n","stream":"stdout","time":"2024-01-24T11:03:02.838571735Z"}
{"log":"NTC 2024-01-24 11:03:03.018+00:00 Doppelganger detection active - skipping validator duties while observing the network topics=\"val_pool\" validator=895caf9f slot=8272513 doppelCheck=ok(258514) activationEpoch=188936\n","stream":"stdout","time":"2024-01-24T11:03:03.019345859Z"}

To Reproduce
Steps to reproduce the behavior:
unknown

Additional context

@cheatfate
Copy link
Contributor

Sorry, but this happens not unexpectedly, VC thinks that corresponding Beacon Node did not respond in time timeout=11s977ms979us74ns. So it considers that Beacon Node goes offline.

{"log":"WRN 2024-01-24 11:02:59.024+00:00 Unable to publish sync committee messages and contributions in time slot=8272512 timeout=11s977ms979us74ns service=sync_committee_service\n","stream":"stdout","time":"2024-01-24T11:02:59.024589419Z"}

In such case it reactivates doppelganger detection.

@stefa2k
Copy link
Author

stefa2k commented Feb 2, 2024

Thank you for taking the time looking into this issue! I discussed this with @tersec on Discord and we weren't sure what caused it. The Nimbus VC got 3 healthy beacon nodes, just the 4th one coming back to life, that's it. We (RockLogic) do this a lot, maintainance requires taking fullnodes connected to Nimbus VCs offline, updating, etc. and getting them online again. We never experienced this behaviour on the Nimbus VC before.

Discord messages: https://discord.com/channels/613988663034118151/613988663034118153/1199672828258168892

/E: Sorry, it was @arnetheduck whom I had the discussion, not tersec 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants