Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex, Header, and Footer Additions for the GUI #873

Open
wants to merge 432 commits into
base: master
Choose a base branch
from

Conversation

slmendez
Copy link

Apologies on the delayed pull request for the front-end changes! Here are the changes as per requested.

Ross Myers and others added 30 commits March 2, 2018 19:46
…amed to page_number in all other locations was still referred to as page)...thanks for pointing this out Shirley!
… if I can get it to store my credentials....
… and the size of the sidebars. Both sidebars weren't set to a size of 100% so I changed that size depending on the size of the window the sidebars look of differentl lengths.
…ed why in the comments. Will start to work on the last component test case before moving onto user case scenario testing
…art on overlap detection on header/footer resize events now.
… Data page, current issue with a stale element exception being thrown but should be able to fix it.
…t of the component testing! What isn't tested, which is just one button is mentioned why on the test case- shouldn't affect the overall functionality and am okay if that button is not tested for; other comments and small things to fix are mentioned in the comments. Will begin the user scenario testing with the first round of pdf files.
…irefoxDriver, added on a better solution for calling the pdf file in order to utilize it in the test cases, as well as being able to delete the file once done.
…r/footer scale, update back-end parsing to accommodate multiple user-drawn rectangles for a single program invocation.....
…ern before and pattern after and checks that there are zero results
…s, have changed it for it to only refresh when the menu bar disappears. Doesn't necessary hurt if the pdf bar or pdf view is not present since it doesn't affect the test cases. Will be updating this method to previous test cases that need it. Also implemented a more correct way to verify tabula is gathering the correct data based on the output
…hpage method, as well as added on the extra data verification for regex inputs. Will move on to the next set of user scenarios
…-result row. Shouldn't be causing any problems
… to go back for the last test case for all the pdf files being tested. Moving on to OneStopVotingSiteListNov2012 file
… couldn't find the words, moving on to work on Mecklenburg.Majority.pdf file
…he last two pdf files, starting to work on the Correspondence_FINAL_SBE pdf file
…ue with the overlap test case but I believe it is due to the waits I included for the buttons so I need to go in and change that.
…dant when there was a better way for it to wait. The sleeps that weren't deleted were kept because of necessity
…o I wouldn't hit the exception of it not being able to click the regex button because the highlight rectangle was blocking it.
…nd the regex pane in the Correspondence_FINAL_SBE file. Moving on to the file test case that needs to be fixed
@jeremybmerrill
Copy link
Member

Hi @slmendez Thanks! I'm aiming to take a look at this this weekend.

@jeremybmerrill
Copy link
Member

Hi @slmendez and everyone --

Thank you again for your hard work on this, it looks awesome. A super useful new set of functionalities. I took a long look at the code and at the interface and I have a few questions, before I start merging, to make sure I understand what stuff is supposed to do. Since you clearly have thought through these new features, I want to make sure I get as much of your thought process as possible.

I love that you all added tests, added social media links to the bottom of the homepage and added comments.

No particular rush; I'll email you all as well.

  • Does the regex search depend on anything happening during the upload step? (If someone already had PDFs they had uploaded with the current Tabula, then upgraded to Tabula with your changes, would Regex Search work with their older PDFs?
  • What's the thinking behind making the regex-created selections not deleteable or moveable?
  • What does PDF Outline button supposed to do? Just hide the sidebar, for more space?
  • What's this green/blue header stuff for? (Is it right that it excludes areas from regex search?) What's the use case you all had in mind for this?
  • Can you describe the testing infrastructure that's included here? We obviously didn't have any tests on the Tabula GUI before, but, obviously, it's great. How do I run it? What are the top-level tools you chose to use for the tests? ("we do frontend testing with ____ framework"?) I know basically nothing about testing JavaScript apps like this.

Thanks again, y'all.

@dbmarsh
Copy link

dbmarsh commented Jul 15, 2018

I can probably answer the middle three questions well enough, though Shirley may have some feedback/points that I'm not taking into account. We have a "final project documentation" folder available on the shared Google Drive space that may provide more detail on the front-end side of the software that I'm not up-to-date on, myself; I'll email a copy to you and Manuel as well in case that's any more convenient. Let me know if you'd like me to clarify or elaborate on anything below!

- What's the thinking behind making the regex-created selections not deletable or moveable?

I think that the selections are deletable via clicking on their respective searches listed in the box/table to the bottom right of the GUI. Beyond that, we didn't consider the need of moving them with our new functionality.

- What does PDF Outline button supposed to do? Just hide the sidebar, for more space?

I believe that is correct, just more of a minor point that we wanted for viewers' flexibility with screen space.

- What's this green/blue header stuff for? (Is it right that it excludes areas from regex search?) What's the use case you all had in mind for this?

The drag down/up is for headers and footers, primarily to keep Tabula from extracting garbage like page numbers or static annotations/descriptions. It does exclude those areas from the search. When talking with Coleman, we were primarily considering page numbers due to the raw amount of undesired output they create. We'd also discussed the option of incorporating page-by-page header and footer "exclusion regions," though decided that it wouldn't be a good use of development time at this point.

@jeremybmerrill
Copy link
Member

@dbmarsh Thanks! I appreciate all the explanation and the additional documentation on Google. I see now how I can delete a regex search.

More questions on header/footer:
Just in terms of how you all anticipate the header/footer draggables being used:

  • It's just for excluding headers/footers in regex search selections? Is it supposed to do anything if I'm doing an old-fashioned click-and-drag selection?
  • It only makes sense if the regex-defined selection spans pages, right?

And with regex search generally -- and you all may have gone over this in the past, sorry for the repetition -- is the idea that multiple similarly-laid-out PDFs would be processed with the same set of regexes?

@dbmarsh
Copy link

dbmarsh commented Jul 16, 2018

- It's just for excluding headers/footers in regex search selections? Is it supposed to do anything if I'm doing an old-fashioned click-and-drag selection?

I can't recall ever working much with the click-and-drag on a machine with our new developments due to our demonstrations primarily being meant to highlight differences from Tabula's original capabilities, but thinking on the back end I believe that the header/footer exclusion checks are carried out alongside regex pair searches so I would say no effect would be desired on click-and-drag functionality. However, I may be mistaken and won't have access to my computer with everything set up to play around with it for the next few days.

- It only makes sense if the regex-defined selection spans pages, right?

There may be some corner cases - e.g. where the regex start or end phrases are found in the header or footer of a single page but these matches are not desired - where the exclusion functionality has the potential to give users additional control over the search capabilities, but in terms of our intentions it was certainly built with multi-page regions in mind.

And with regex search generally -- and you all may have gone over this in the past, sorry for the repetition -- is the idea that multiple similarly-laid-out PDFs would be processed with the same set of regexes?

From our discussion over the course of the semester, it seems that the regex search capabilities would provide the most benefit in two cases: (1) batch processing of similarly-laid-out PDFs; and (2) extremely large PDFs with many instances of matching regions. Most of our testing and demonstration have focused on the former, but there is potential with respect to the latter when considering the alternative would be manually locating and selecting these regions via the click-and-drag approach. In general, though, I agree with the sentiment above.

@slmendez
Copy link
Author

Hi @jeremybmerrill , sorry for taking me so long to respond back to your questions!

I'll be adding on to what David has answered:

Can you describe the testing infrastructure that's included here? We obviously didn't have any tests on the Tabula GUI before, but, obviously, it's great. How do I run it? What are the top-level tools you chose to use for the tests? ("we do frontend testing with ____ framework"?) I know basically nothing about testing JavaScript apps like this.

The testing framework is built on selenium and all the test cases are written in java. Half of the test cases are just testing the actual page functionality (clicking links, buttons, etc.) and the other half is actual testing with pdf files to produce output and validating the output. The test cases can be run and checked for on Travis CI, Ross had helped me set up the Travis file for it to run and show the results of the test cases on there, please let me know if you have trouble running them!

Also a point to bring up with the test cases is how all of them depend on how long it takes the page to load, so in some cases if out of the blue the page takes too long to load up the test case just fails, despise giving it allotted time for it to wait. I've looked into this issue and it seems to be a pretty common issue with a lot of front-end testing that depend on page load time.

Lastly, in our project final documentation, I believe under the documentation folder we have document files for the front-end and back-end testing. On there it goes more in detail of the pdf files used and what each test case is doing.

Please let me know if you have any more questions!

@jeremybmerrill
Copy link
Member

@slmendez Thanks! I've got the tests running now, I think.

@dbmarsh: Great, thanks! Very helpful to hear the thinking behind this.

Will let y'all know if I have more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants