Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blockgen job fails to clean up failed reduce attempts #56

Open
xkrogen opened this issue Aug 9, 2018 · 0 comments
Open

Blockgen job fails to clean up failed reduce attempts #56

xkrogen opened this issue Aug 9, 2018 · 0 comments

Comments

@xkrogen
Copy link
Collaborator

xkrogen commented Aug 9, 2018

The block generation job has custom output logic to allow each reducer to output to multiple block files.

When speculative execution is enabled, this can result in two copies of the same block file being generated (one of which may be incomplete). This can be worked around by setting mapreduce.reduce.speculative = false.

When a reducer attempt fails, the partial output files will not be cleaned up. I'm not aware of an easy workaround for this beyond manually cleaning up the files after the job completes.

We should have each reducer use a staging directory and only move the output files when it completes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant