Work through a large number of files piece by piece #143

WolfgangDpunkt · 2023-01-23T15:48:14Z

When I try to convert a large number of HTML files to Epub by specifying a folder with e.g. 120 files as source, percollate tries to process the whole bunch at once. If the process fails, not a single file is successfully processed and it has to be started completely over.

After all, there could always be a corrupt html file that causes the entire process to fail and since no file-by-file processing is possible, prevents the conversion of all other files.

It would be more practical if percollate would process one file after the other iteratively, so that the finished epubs are saved at once, not only when the conversion of all other source files has worked.

After all, there could always be a corrupt html file that causes the entire process to fail and since no file-by-file processing is possible, prevents the conversion of all other files.

This would be my idea, which would make the application even more useful for power users.

Thanks a lot!

danburzo · 2023-01-27T12:38:22Z

Hi @WolfgangDpunkt, do you mean feeding the entire list of files to a single percollate command using the --individual flag? If so, I agree that can be made more resilient to errors in some of the input files. In fact, the entire flow could use a refresh to be made more robust. In the meantime, you can use xargs -L1 to invoke separate percollate commands for each file:

cat urls.txt | xargs -L1 percollate epub

WolfgangDpunkt · 2023-01-27T13:38:08Z

Hi @danburzo ,

sorry, I have expressed myself a bit unclearly.
Yes, what I do is feeding an entire list of files to a single percollate command using the --individual flag.
But not from a list like urls.txt, but I use a specified folder as source (in this folder HTML files are saved automatically, which I create with the browser addon SingleFile, which together results in my individually built read-it-later solution).
Here is the command:
percollate epub --individual --output /home/myEpubs/ /home/dTmpHtmlFolder/*.html

As I said, as result everything ends up in one process, which can easily fail because of only one file.
Thanks for the tip, I will try that as a workaround.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work through a large number of files piece by piece #143

Work through a large number of files piece by piece #143

WolfgangDpunkt commented Jan 23, 2023

danburzo commented Jan 27, 2023

WolfgangDpunkt commented Jan 27, 2023

Work through a large number of files piece by piece #143

Work through a large number of files piece by piece #143

Comments

WolfgangDpunkt commented Jan 23, 2023

danburzo commented Jan 27, 2023

WolfgangDpunkt commented Jan 27, 2023