Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Footnote parser #217

Open
plucena24 opened this issue Sep 23, 2018 · 0 comments
Open

Footnote parser #217

plucena24 opened this issue Sep 23, 2018 · 0 comments

Comments

@plucena24
Copy link

It appears that some files get stuck in the while true loop of parse_table dealing with footnotes, never hitting break. Took me a while to debug this.

The infinite loops I ran into would forever call into the get_footnote function, which happens within the “while True” statement. To validate, I commented out that code block dealing with footnotes and setting footnote = None right above the while loop....problem solved.

Note: the code would perform A LOT better if the regexes used would be pre-compiled rather than having to be compiled each time during parsing each line. It may not matter once the system is up and just has to parse 1 file a day, but for initial bulk parsing it does make a big difference :).

There is also an assignment of an undefined variable into key “raw_table”, also within this function. At the start this key is populated with variable called “text”. But then a few lines below its reassigned with an undefined variable. It happens 2 times within the function.

Caught this while compiling the code with cython (as I was debugging the issue thinking it was performance related).

Cython found that, and found the unused function at the bottom of this file.

Sent with GitHawk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant