Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie simulated datanode #81

Open
fengnanli opened this issue Feb 27, 2019 · 1 comment
Open

Zombie simulated datanode #81

fengnanli opened this issue Feb 27, 2019 · 1 comment

Comments

@fengnanli
Copy link
Contributor

After running start-dynamometer-cluster.sh and replay the prod audit log for some time, some simulated datanodes (containers) lost connection to the RM and when the Yarn application is killed, these containers are still running, which will sending their blocks to the Namenode.
In this case, since datanode has gone through some changes with the replay where Namenode started from a fresh fsimage. Below errors will show up in the webhdfs page after the Namenode starts up.

Safe mode is ON. The reported blocks 1526116 needs additional 395902425 blocks to reach the threshold 0.9990 of total blocks 397826363. The number of live datanodes 3 has reached the minimum number 0. Name node detected blocks with generation stamps in future. This means that Name node metadata is inconsistent.This can happen if Name node metadata files have been manually replaced. Exiting safe mode will cause loss of 7141 byte(s). Please restart name node with right metadata or use "hdfs dfsadmin -safemode forceExitif you are certain that the NameNode was started with thecorrect FsImage and edit logs. If you encountered this duringa rollback, it is safe to exit with -safemode forceExit.

and checking datanode tab in the webhdfs page, a list of a couple datanodes will show up.

@xkrogen
Copy link
Collaborator

xkrogen commented Feb 27, 2019

Thanks for reporting this @fengnanli ! I think I asked before but I don't remember your answer, was this running within a secure environment using LinuxContainerExecutor / cgroups? I think that is what prevents such things from occurring in our environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants