Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert rtf files to plaintext #19

Open
dputtick opened this issue Aug 4, 2017 · 0 comments
Open

Convert rtf files to plaintext #19

dputtick opened this issue Aug 4, 2017 · 0 comments

Comments

@dputtick
Copy link
Contributor

dputtick commented Aug 4, 2017

Currently, filecheck.py leaves rtf files untouched, and only changes their extension to .txt (in File.text()). An rtf, when opened as plaintext, will be difficult to read due to the various pieces of formatting code mixed in the text. Ideally, we should be able to extract the content from an rtf file during processing.

Unfortunately, there aren't any great existing solutions for this other than OpenOffice, which gives us a dependency we'd prefer not to have. If you don't need 100% compatibility, it's fairly reasonable to write an rtf parser: here is a library that implements most of the behavior we want, which could be a good starting point. Unfortunately, that code is Python 2 only, and perhaps a little verbose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant