Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements to 'non-graceful' interrupt to ensure WARCs are still closed gracefully #504

Merged
merged 6 commits into from Mar 21, 2024

Conversation

ikreymer
Copy link
Member

The intent is for even non-graceful interruption (duplicate Ctrl+C) to still result in valid WARC records, even if page is unfinished:

  • immediately exit the browser, and call closeWorkers()
  • finalize() recorder, finish active WARC records but don't fetch anything else
  • flush() existing open writer, mark as done, don't write anything else
  • possible fix to additional issues raised in WARC Validation Error appears from time to time #487

This should work with multiple SIGINT/SIGTERM signals. Sending a SIGKILL / docker kill .. will result in immediate exit and may still result in invalid WARC records, as the crawler is interrupted immediately.

…osed gracefully:

- immediately exit the browser, and call closeWorkers()
- finalize() recorder, finish active WARC records but don't fetch anything else
- flush() existing open writer, mark as done, don't write anything else
@ikreymer ikreymer requested a review from tw4l March 21, 2024 17:25
Copy link
Contributor

@tw4l tw4l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just some formatting suggestions in the docs updates (thanks for those too!)

docs/docs/user-guide/common-options.md Outdated Show resolved Hide resolved
docs/docs/user-guide/common-options.md Outdated Show resolved Hide resolved
Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
@ikreymer ikreymer merged commit 93c3894 into main Mar 21, 2024
4 checks passed
@ikreymer ikreymer deleted the improved-interrupt branch March 21, 2024 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants