Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to process only new PDFs #17

Open
cutright opened this issue Jan 7, 2020 · 1 comment
Open

Request to process only new PDFs #17

cutright opened this issue Jan 7, 2020 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@cutright
Copy link
Owner

cutright commented Jan 7, 2020

Feature request to ignore previously processed PDFs

@cutright cutright added the enhancement New feature or request label Jan 7, 2020
cutright added a commit that referenced this issue Jan 7, 2020
@cutright
Copy link
Owner Author

cutright commented Jan 7, 2020

main.process_files() in branch issue_17 has the feature to ignore previously processed files. Collecting all processed files is pretty fast, but it seems like the bottleneck is iterating through the OS directory, not parsing the data. Or perhaps the time is spent checking if a file name exists in the previously processed files.

Needs investigation.

@cutright cutright added this to the v0.3.1 milestone Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant