New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bor sync stuck at block 0x312d050 #1115
Comments
same |
I tried also to Any idea how to fix it? |
Same issue. |
Can you check your peers using ipc and admin.peers command ? |
enode://11e0cbb03a834019b0222f54bccf32512bef4294dd722642684762d1d01c84031c1075767195d9968dcdb9e38326f08b14547d8e33b0b67a0ef1aa0b045845d0@35.171.120.130:30303?discport=30315,
enode://b0f026f7ccfd5c1450e933572ae44b262a7d084647a30d0a8d9e2c8cab8d5b1c7721f3c60bfcd50c0fede114c7e2d316649389ba2449ca85d1ddd9e2947f1c28@147.135.100.106:30303?discport=30334,
enode://2d4bd1fa38182fa868a583fc946c8d5e4043b013381cf20927c16cf8f17b4f3e793c5e9f34fc785c52d887aab07181bdb0ebae50d9e3f05e5c14aed19f81929a@65.108.127.87:30303?discport=30340,
enode://ab879b4eaacf495ec760f2806e78509da80e327ba4262d8153698f88b0a95287a692bbaf3a3cece9ad27f889246c04e2b5ca8e75bf083acbb4806eb669cc3a77@35.171.120.130:30303?discport=30334,
enode://1a69f7dae12959a358b92a395ec79de2ab4601a59a5b0b951d4e6247da2101d7d6d77a919086251e70b552a49ae74d630e19233306a189a1b627c2115ecf3cfa@34.203.27.246:30303?discport=30320,
enode://574a9195f40a7c4bd68536167ef53a7385bab8934dfc8db94d013b1a73af76eb73f148536cb8b8365e8240728f6e80af0ddb4ead3a2544de907cce561839ce61@51.81.217.117:30303?discport=30323,
enode://142cce22e125325f4895b2268e32185f5dbe90f9c818ab135f16c7face23a55b46d0b78a0286595a262d4fa58ff314e7e2553e13f528a3c3e9616184b77f5b85@65.108.127.87:30303?discport=30323,
enode://50c8f9d2849a209383edd15dfd67ba0a8d3f5e9853fd1af9c1678f4aef2dc5e3817c34ddce9390d5e8dd4891ad7f66003a3bea5af9e288df6f26ed070d9bd741@54.38.217.112:30303?discport=30335,
enode://72be2da5ba01bc2f3a7764bf1d4f18550a36df629820ea0f6d37fe1cd1355d0f1c201b2a5f382e794ee56e0f5befa504e85e96548a45a0fba44bb6bd1075e28e@54.38.155.225:30303?discport=30306,
enode://53b53f55f2a1674873f8f58ee23616db8384f278a1206cf79c8c18d4ebc32b4424128229de2ea999803c08c9262974f1fb1f2b0d87ca6ec40aea1594c0ba0ef7@65.108.1.189:30303?discport=30337,
enode://eb0ee5596ea6df526eb7e0ace41f015bcb9ee4f27996c72ea15d1cd28ec69f89b6e64247696c0150111b52ca58810f5d0f42d59ac38fdb26ba7323bcc835475a@51.81.196.100:30303?discport=30313,
enode://c4a2a7c422ddce70a39164ce53762262bd5dc8917f5613b1c92c94affb36516e63f88721763a1dcfed5f36403e0fc21894e34c2981f2f6f1f100b9f186a986a1@51.38.72.15:30303?discport=30307,
enode://2197472b27c39587e2ae2c199e91527a25d25b2c1217f14c8d8b342068209a889913c7c1eb6f60044a0d28bd59ccec157d18ebb7918293e8878d11185831cf22@54.38.75.21:30303?discport=30320,
enode://b6d9bef47ce86b94331cdcfd2a1a91f28ab48db171aa70659973b3869988e7e4806fd24406c6f57187664643dffc0edf74e7a16ac315ca7933589357ec875550@51.38.72.15:30303?discport=30311,
enode://4585b746a2ae2f74575313199bd35159e8b679608fa1bd4e3a2823c0c24f8e49f9cb1e0c312de30a8b08c16a6666101897ffff47a6c162dca6ddb87c206c4cd2@66.70.233.151:30303?discport=30313,
enode://c8ab3d6ec8d7c1c7df462f55f02acaced2949ec4542475fa25ebb104feaa78a196f0e39cfc2bf1236ead1c647b734726cb9f4f03eb933c94f318cca160e5ce16@54.38.217.112:30303?discport=30334 |
Sure. I rolled back to bor v1.1.0 from v1.2.1 because some issues said that rollingback might be a solution. and then found an interesting performance every time I restart bor service, it syncs for a while and then stuck with the above "whitelist milestone" log.
|
Any idea how can we solve the issue? |
Ι tried to apply https://forum.polygon.technology/t/recommended-peer-settings-for-mainnet-nodes/13018 |
Above suggestions are not fixing the issue. Any other suggestion? |
no luck. Tried a new physical machine with bor 1.1.0 and Heimdall 1.0.3 with snapshot data. All over again. Stuck randomly. The original one with weeks of manual restarts, finally went well for half month, not sure why, and afraid of unexpected stuck someday |
@0xKrishna I think I might have hit the same problem on two nodes. The first node stop importing blocks ~8d the other around 2 hours ago. Node Stopped 2 hours ago (Stopped 2024-01-16 @ 18:30:00 EST)I have the pprof Goroutine dump for it, see pprof.geth.goroutine.polygon-mainnet-0.pb.gz. It seems to be blocked at https://github.com/maticnetwork/bor/blob/master/core/blockchain.go#L1888. Node Stopped 8 days ago (Stopped 2024-01-09 @ 12:00:00 EST)I have a pprof too, see pprof.geth.goroutine.polygon-mainnet-1.pb.gz. On this one I don't clearly see what is blocked. I don't even seems to see the blockchain import goroutine there, so not sure what it was doing. For this dump, I have a Let me know if you need more info, I'll more closely follow the nodes to see if they get stuck again so I could gather extra data points. Extra DetailsI tried to stopped this node cleanly, sending a single SIGINT signal, then waited for 4 hours to stop cleanly but it never happened. I decided to force killing which means in this state, this stuck node never completed the clean shutdown sequence. |
Same issue on two independent nodes, random block stuck with ERROR: |
Hey @eldimious @VSGic @maoueh @GeassV ,
Thank you! 💜 |
well, stuck at 52755409 and then moved to 52756404 and stuck again when trying to dump the log and config files |
Hello, the same problem after downgrade. |
Hello, I have the same issue. The bor node is stuck at block number 52962568.
I tried to restart the bor node, but it took a long time to try to stop. Finally, it was killed by systemd for 'stop-sigterm' timed out. After starting, the block number rolls back to 52921882, it far away from the stuck block number 52962568. |
@CaCaBlocker You can ignore these logs for now as your node is not completely synced. |
@RyanWang0811 Is it working now? |
It is working now. thx. |
Hi, still have this problem, I restart bor 3-5 times per day |
Still have this problem, too. This issue is like what I posted previously and the issue seems not to have been repaired or still has any issue. Is it a node bug? or any issue on the chain? |
Problem still actual, two nodes with different bor versions struggle |
Hey @RyanWang0811 @VSGic what specific errors are you facing currently ? Can you share some logs ? |
Hello @Raneet10 I have posted logs above here. I have two nodes, one with bor v1.2.3 , and it have the same problem |
I encountered this problem using the latest version on the testnet, and there is no solution yet。heimdall:v1.0.4-beta,bor:v1.2.6-beta |
Hello ! Just wanted to mention that we are experiencing the same issues with our 2 polygon bor nodes. I have setup a liveness probe (k8s) to restart the node if it get stuck for more than 15 min. It kinda work but it’s really annoying and we still manage to have small interruptions when both nodes get stuck at the same moment. It happens multiple times per day. It’s really bad. Anything planned to fix those issues ? By the way I compared the errors I got in Heimdall and bor logs while it was stuck on a block to the logs I had on the other node that was working. And I found exactly the same error in both. So the issue for sure is not being logged... |
Hello, still have this trouble, we cannot send transactions with such node. They are get lost, when node out of sync. We work with polygon in manual regime |
Hi, still actual, and become worse, one node even cannot get synced after reboot and stucks on the way again |
Also faced this issue when bootstraping node from official snapshot. Seems that removing |
After update to 1.2.7 problem still exists, bor loses sync from blockchain at accidental moment and only reboot pushes it ti start sync from the stuck block, but then it repeats. |
Hey, it would be really helpful if we can get the stack trace to see where the bor process is stuck and navigate the root cause. You can get it by the below 2 ways.
Could you help us with the same? Thanks! |
Hi, @manav2401 |
Hello, my current bor client is also stuck in a certain block. The block is 54875999. I rolled back the bor client blocks to 500, 2000, and 15000 and still the problem has not been solved.
|
Hey guys! bor is stuck in this state now:
Hey I think had the same issue, but after trying to fix it, I'm pretty sure I broke my database (it is not starting anymore). I need to download the bor snapshot, but the last image is from February. Is there another link, more recent than this one, to download polygon snapshots? |
I can confirm I’m facing the same issue as described here. If anyone has any more recent updates could they post them here with things they’ve tried. |
Hi,The number of my peer nodes has always been at a very low level, and the quality of the peer nodes found is very poor, so the data cannot be synchronized. Is there any solution? |
I turned the verbosity up on bor to 4 and restarted. It sync'd to head, and then stopped sync'ing. I saw this in the logs:
After another restart i got a bunch of problems with ip table limit:
Additionally found this post on geth ethereum/go-ethereum#1563 where peer count is low and getting accepted by other peers don't seem to occur easily. |
Interesting even though my blocks aren't coming through I do get the occassional transaction appear. So obviously some connectivity is still occurring. Where do blocks come from? I thought bor was the recipient of blocks, but is it actually heimdall? |
Ok randomly last night after yes another restart I was able to get blocks through steadily and it hasn't stopped since. The last change I made was to increase the maxpeers=200 in config.toml for Bor. I don't know exactly why this might have been the desired change, but it appears to have worked. |
Really thanks, nice try; Mine stuck and not sync even with manual restart. Followed your setting, seems fine now |
Hi everyone. updating setting to maxpeers=200 did not help for me. Still stucks. |
I'm reasonably confident it's a problem with (a) having decent peers (b) having connectivity to those peers and discoverability Firstly check some things:
I found that once I had set up correct port exposure and enough peers it seems to continue to work ok. |
Alternatively you can just utilise my list of peers:
Put these in your bootnodes/static-nodes and hopefully you will get running. |
@james-turner Thank you for explanations, I have followed your instruction, it did not help finnally, but behavior of node have changed. Now there is no stucking, but irregular rising and going down lag from the blockchain blocks. |
Well, it survived for about two weeks and then stuck again, syncing for a while and stuck repeatly. |
Hey all, We are no longer providing snapshots for the community. Instead, we have transitioned to a community-driven model where snapshots are provided by some of the most active members. These include the following validators: Vault Staking (Mainnet/Mumbai), Stakepool (Mainnet/Amoy), StakeCraft (Mainnet/Mumbai/Erigon Archive) and Girnaar Nodes (Amoy). Also, StakeCraft has introduced a new service for the Polygon community - All4nodes.io aggregator service where snapshots from all different providers can be found. More details here: https://forum.polygon.technology/t/stakecraft-introduces-a-new-service-for-polygon-community-all4nodes-io-aggregator-service/13694/1 This decision has been made to foster greater community involvement and to distribute responsibilities more equitably among our dedicated community members. Empowering the community to generate snapshots will not only ensure their timely availability but also promote collaboration and engagement within our community. For inquiries, contact community services directly. Regards, |
So the problem to download fast snapshot for bor is secondary now, more serious is bor regular stuck, because it is not possible to provide transactions with such nodes and this is affects business |
System information
Bor client version: 1.2.1
Heimdall client version: 1.0.3
OS & Version: Linux
Environment: Polygon Mainnet
Type of node: Full
Overview of the problem
I am running a full node using bor and heimdall via docker the last 2 months but seems that the bor sync stucks 11h ago at block 0x312d050. I am getting following logs from bor docker image:
Any idea how can i fix it? I tried to restart docker image but the error remains.
The text was updated successfully, but these errors were encountered: