New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault when testing a bad WARC ending in gzip header (10 bytes) and no data on Mac #99
Comments
Hi Ilya Thank you for reporting this issue. Unfortunately, it seems like we have some cross-platform support problems that we would like to fix, which your issue highlights. One thing that would help us greatly would be to find the |
@trym-b yes, I've managed to isolate the specific WARC, it seems to be an invalid WARC with last record truncated (which of course should be an error), and not anything to do with recursive/directory checking. |
I believe the issue with that file is that at the end, it has a complete record and the 10 bytes for the next gzip header for next record, and no data after that. |
I am happy that you were able to root cause it so quickly. After downloading and opening the file with the following command, I also get a segfault.
We have done some code changes since the release, so it is likely that the version I tested on (latest I agree that a malformed warc file should not in general result in a panic/segfault. I will look into fixing this error. I hope I can use the warc you provided as test data, if you don't mind? |
Of course, feel free to use this as test data |
# Motivation Many top level commands are missing tests. This is a first step to add a test for `ls` with proper test data. # Changes This commit adds a very simple test for `ls` that checks that it does not crash when reading a file. # Future work Add more checks to the test so that it is even better at avoiding regressions. # Acknowledgements Thanks to Ilya Kreymer for providing the test data through issue #99
# Motivation Many top level commands are missing tests. This is a first step to add a test for `ls` with proper test data. # Changes This commit adds a very simple test for `ls` that checks that it does not crash when reading a file. Also added `lfs` checkout to every workflow so that any lfs file is fetched wrongly. # Future work Add more checks to the test so that it is even better at avoiding regressions. # Acknowledgements Thanks to Ilya Kreymer for providing the test data through issue #99
# Motivation Many top level commands are missing tests. This is a first step to add a test for `ls` with proper test data. # Changes This commit adds a very simple test for `ls` that checks that it does not crash when reading a file. # Future work Add more checks to the test so that it is even better at avoiding regressions. # Acknowledgements Thanks to Ilya Kreymer for providing the test data through issue #99
Tried out the latest release on Mac, and am getting this segfault with Browsertrix Crawler WARCs:
Have not had time to isolate which WARC is causing it exactly, but could later, if that would be helpful.
Tried both with
-r
and with a--source-file-list
with default concurrency.The text was updated successfully, but these errors were encountered: