New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdf file inside a rar archive cannot be opened by zathura #136
Comments
This comment has been minimized.
This comment has been minimized.
Thanks for the issue report (and the archive). I am no expert in the .pdf format, actually I know basically nothing. If you mount the archive and then copy the file from the mount point to some location on your regular file system, does zathura open it without problems then? Please note also that |
yes, as I already written - if the pdf file is copied from the rar2fs mount point into my home directory then zathura works correctly. And archivemount does fuse mount as well. To test I'd create a 2MiB file filed with some pattern, say repeated sequence from 0 to 255. Then I'd try to randomly access a 256 bytes chunk (or any other size) and check if the pattern pass. It could be done even with a perl script. Say If I read 256 bytes beginning with an offset 238774 then I should get ((238774%256)+i)%256; where i [0..256]. ( I hope I put it right) |
What you have here is not really related to random access in general but the fact that zathura tries to read 4k at the very end of the file before reading from offset 0. I am not sure how this relates to the
That was not exactly what you said if you look back ;) Hence the question. I am not too sure what I can do here right now other than explain the rationale behind why it works just like you described it. Of course we could also start to extract everything ahead-of-time, but that would really be a real pain for large contents. Possibly the access pattern could trigger such a fallback but it would possibly also make other things break apart. EDIT: I did a quick comparison and not even zathura
gimp
What you can see here is that rar2fs does a sort of replay-trap, trying to restart the read to become aligned. But the trap is triggered once for zathura but twice for gimp, which can be seen by having the same sequence number multiple times. Other than that they look fairly the same. |
can an artificial read with offset 0 be inserted by rar2fs then? So the first read is always done by rar2fs and discarded. Will it help? |
No, that is not really the problem. But, I did try to disable the long jump hack we have to specifically handle e.g. AVI files and then it works as long as the I/O buffer is greater than or equal to the size of the .pdf file. |
But hang on. there might be a quick for this. But I would need to carefully do some regression on it before it would be possible to release it to the public. But if you would accept a patch for your own testing it might be possible using just a few lines of code. |
sure I can test it |
Please try this one.
EDIT: Note now that the whole idea behind this is that you would control this yourself using the size of the I/O buffer. An access far away from the current file position would otherwise trigger a long jump hack suitable only for some specific file formats. But if the I/O buffer is big enough it can digest all the information and the hack is avoided. The default size of the I/O buffer is 4MB which you can control using the |
nope, doesn't seem to help. Neither with 2.5 MiB nor with 25 MiB pdfs. |
If you have a 2.5MB .pdf you need an I/O buffer of 8MB. For a 25MB .pdf you need 64MB. You can reduce the size by half if you set the history size to 0. By default it is 50% of the I/O buffer size.
|
ok, with the patch applied and with --iob-size=64 seems to work correctly for small files (less that 2MiB) and for big files (bigger than 100MiB). This covers all my use cases. Thanks |
It is an interesting observation though. This would mean the PDF file format is rather volatile. |
I will have to start looking into what can be done here. I have put the issue in the backlog. |
I've cloned sources today and built rar2fs. If I try to mount a pdf file inside of a rar file then zathura won't open it saying - 'Document does not contain any pages'. If I copy the pdf file to a regular file system then it can be opened by zathura. I believe this is the related to the random access problem.
Additionally I put the same pdf into a zip archive which was also fuse mounted using archivemount and zathura opened it normally too. So it really is rar2fs that produces such strange problem
ps: some pdf files inside rar archives mounted by rar2fs can be opened normally. This is probably related to the pdf file structure.
psps: problematic pdf that I have could be subject of copyrights claims, so I canno't simply attach it. But I'll try to find a similar file
The text was updated successfully, but these errors were encountered: