Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM Crash - failed to store on BlockDropped in retainer #934

Closed
shufps opened this issue Apr 25, 2024 · 2 comments · Fixed by #947 or #946
Closed

OOM Crash - failed to store on BlockDropped in retainer #934

shufps opened this issue Apr 25, 2024 · 2 comments · Fixed by #947 or #946
Labels
team-node Issues for Node Team
Milestone

Comments

@shufps
Copy link
Contributor

shufps commented Apr 25, 2024

We have two nodes that crashed on out of memory.

It seems they started to log this error message:

Protocol.Engine0    	engine error (err=blockRetainer: failed to store on BlockDropped in retainer: cannot update block metadata for block BlockID(0xbc718142f4c3957f2e7484dec30b891a9edfc09b2d50c8faa8d753d09bb8dc12d4830000:33748) with state dropped as block is already committed)

About 50k times per hour.

Memory inflated at the time:
image

We have a log file when it started:
faucet.h.iota2-alphanet_2024-04-24-09.log

Unfortunately it happened at night, so we have no memory profile of this node.

But we have profile of another node that started at the same time but "recovered" later on (while memory usage still is high)
image

pprof.validator-2_20240425-075134_all.zip

Maybe it shows something 🙈

@alexsporn alexsporn added the team-consensus Issues for Consensus Team label Apr 25, 2024
@alexsporn alexsporn added this to the v1.0.0-beta milestone Apr 25, 2024
@alexsporn
Copy link
Member

Same underlying deadlock in the DDR-Scheduler as in #936

@alexsporn
Copy link
Member

goroutine 8456281 [sync.RWMutex.RLock, 1150 minutes]:
sync.runtime_SemacquireRWMutexR(0xc00048bb08?, 0xa0?, 0xc0004ef560?)
	/usr/local/go/src/runtime/sema.go:82 +0x25
sync.(*RWMutex).RLock(...)
	/usr/local/go/src/sync/rwmutex.go:70
github.com/iotaledger/iota-core/pkg/protocol/engine/congestioncontrol/scheduler/drr.(*Scheduler).ReadyBlocksCount(0xc000346fa0)

@alexsporn alexsporn added team-node Issues for Node Team and removed team-consensus Issues for Consensus Team labels Apr 29, 2024
This was linked to pull requests Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-node Issues for Node Team
Projects
Status: Done
2 participants