Regex, Header, and Footer Additions for the GUI #873

slmendez · 2018-06-13T02:55:43Z

Apologies on the delayed pull request for the front-end changes! Here are the changes as per requested.

…amed to page_number in all other locations was still referred to as page)...thanks for pointing this out Shirley!

…t...

… if I can get it to store my credentials....

… and the size of the sidebars. Both sidebars weren't set to a size of 100% so I changed that size depending on the size of the window the sidebars look of differentl lengths.

…ed why in the comments. Will start to work on the last component test case before moving onto user case scenario testing

…ition # Conflicts: # .idea/workspace.xml

…art on overlap detection on header/footer resize events now.

…s...

… Data page, current issue with a stale element exception being thrown but should be able to fix it.

…ition

…t of the component testing! What isn't tested, which is just one button is mentioned why on the test case- shouldn't affect the overall functionality and am okay if that button is not tested for; other comments and small things to fix are mentioned in the comments. Will begin the user scenario testing with the first round of pdf files.

…s on travis CI

…t will fix why the build is failing

… to see if it will work

…irefoxDriver, added on a better solution for calling the pdf file in order to utilize it in the test cases, as well as being able to delete the file once done.

…r/footer scale, update back-end parsing to accommodate multiple user-drawn rectangles for a single program invocation.....

…gex search disabled button

…and pattern before

…ern before and pattern after and checks that there are zero results

… input to pattern before and pattern after

…me previous test cases

…s, have changed it for it to only refresh when the menu bar disappears. Doesn't necessary hurt if the pdf bar or pdf view is not present since it doesn't affect the test cases. Will be updating this method to previous test cases that need it. Also implemented a more correct way to verify tabula is gathering the correct data based on the output

…hpage method, as well as added on the extra data verification for regex inputs. Will move on to the next set of user scenarios

…more test scenarios that use inclusive

… wait until a regex result appears

…-result row. Shouldn't be causing any problems

… to go back for the last test case for all the pdf files being tested. Moving on to OneStopVotingSiteListNov2012 file

… non-inclusive

… couldn't find the words, moving on to work on Mecklenburg.Majority.pdf file

…he last two pdf files, starting to work on the Correspondence_FINAL_SBE pdf file

…ue with the overlap test case but I believe it is due to the waits I included for the buttons so I need to go in and change that.

…eptions thrown

…dant when there was a better way for it to wait. The sleeps that weren't deleted were kept because of necessity

…o I wouldn't hit the exception of it not being able to click the regex button because the highlight rectangle was blocking it.

…nd the regex pane in the Correspondence_FINAL_SBE file. Moving on to the file test case that needs to be fixed

…rectory, should be fixed now

…1-GUI

jeremybmerrill · 2018-06-15T14:23:23Z

Hi @slmendez Thanks! I'm aiming to take a look at this this weekend.

jeremybmerrill · 2018-07-14T17:18:59Z

Hi @slmendez and everyone --

Thank you again for your hard work on this, it looks awesome. A super useful new set of functionalities. I took a long look at the code and at the interface and I have a few questions, before I start merging, to make sure I understand what stuff is supposed to do. Since you clearly have thought through these new features, I want to make sure I get as much of your thought process as possible.

I love that you all added tests, added social media links to the bottom of the homepage and added comments.

No particular rush; I'll email you all as well.

Does the regex search depend on anything happening during the upload step? (If someone already had PDFs they had uploaded with the current Tabula, then upgraded to Tabula with your changes, would Regex Search work with their older PDFs?
What's the thinking behind making the regex-created selections not deleteable or moveable?
What does PDF Outline button supposed to do? Just hide the sidebar, for more space?
What's this green/blue header stuff for? (Is it right that it excludes areas from regex search?) What's the use case you all had in mind for this?
Can you describe the testing infrastructure that's included here? We obviously didn't have any tests on the Tabula GUI before, but, obviously, it's great. How do I run it? What are the top-level tools you chose to use for the tests? ("we do frontend testing with ____ framework"?) I know basically nothing about testing JavaScript apps like this.

Thanks again, y'all.

dbmarsh · 2018-07-15T12:56:40Z

I can probably answer the middle three questions well enough, though Shirley may have some feedback/points that I'm not taking into account. We have a "final project documentation" folder available on the shared Google Drive space that may provide more detail on the front-end side of the software that I'm not up-to-date on, myself; I'll email a copy to you and Manuel as well in case that's any more convenient. Let me know if you'd like me to clarify or elaborate on anything below!

- What's the thinking behind making the regex-created selections not deletable or moveable?

I think that the selections are deletable via clicking on their respective searches listed in the box/table to the bottom right of the GUI. Beyond that, we didn't consider the need of moving them with our new functionality.

- What does PDF Outline button supposed to do? Just hide the sidebar, for more space?

I believe that is correct, just more of a minor point that we wanted for viewers' flexibility with screen space.

- What's this green/blue header stuff for? (Is it right that it excludes areas from regex search?) What's the use case you all had in mind for this?

The drag down/up is for headers and footers, primarily to keep Tabula from extracting garbage like page numbers or static annotations/descriptions. It does exclude those areas from the search. When talking with Coleman, we were primarily considering page numbers due to the raw amount of undesired output they create. We'd also discussed the option of incorporating page-by-page header and footer "exclusion regions," though decided that it wouldn't be a good use of development time at this point.

jeremybmerrill · 2018-07-16T01:32:45Z

@dbmarsh Thanks! I appreciate all the explanation and the additional documentation on Google. I see now how I can delete a regex search.

More questions on header/footer:
Just in terms of how you all anticipate the header/footer draggables being used:

It's just for excluding headers/footers in regex search selections? Is it supposed to do anything if I'm doing an old-fashioned click-and-drag selection?
It only makes sense if the regex-defined selection spans pages, right?

And with regex search generally -- and you all may have gone over this in the past, sorry for the repetition -- is the idea that multiple similarly-laid-out PDFs would be processed with the same set of regexes?

dbmarsh · 2018-07-16T02:57:07Z

- It's just for excluding headers/footers in regex search selections? Is it supposed to do anything if I'm doing an old-fashioned click-and-drag selection?

I can't recall ever working much with the click-and-drag on a machine with our new developments due to our demonstrations primarily being meant to highlight differences from Tabula's original capabilities, but thinking on the back end I believe that the header/footer exclusion checks are carried out alongside regex pair searches so I would say no effect would be desired on click-and-drag functionality. However, I may be mistaken and won't have access to my computer with everything set up to play around with it for the next few days.

- It only makes sense if the regex-defined selection spans pages, right?

There may be some corner cases - e.g. where the regex start or end phrases are found in the header or footer of a single page but these matches are not desired - where the exclusion functionality has the potential to give users additional control over the search capabilities, but in terms of our intentions it was certainly built with multi-page regions in mind.

And with regex search generally -- and you all may have gone over this in the past, sorry for the repetition -- is the idea that multiple similarly-laid-out PDFs would be processed with the same set of regexes?

From our discussion over the course of the semester, it seems that the regex search capabilities would provide the most benefit in two cases: (1) batch processing of similarly-laid-out PDFs; and (2) extremely large PDFs with many instances of matching regions. Most of our testing and demonstration have focused on the former, but there is potential with respect to the latter when considering the alternative would be manually locating and selecting these regions via the click-and-drag approach. In general, though, I agree with the sentiment above.

slmendez · 2018-07-16T04:10:54Z

Hi @jeremybmerrill , sorry for taking me so long to respond back to your questions!

I'll be adding on to what David has answered:

Can you describe the testing infrastructure that's included here? We obviously didn't have any tests on the Tabula GUI before, but, obviously, it's great. How do I run it? What are the top-level tools you chose to use for the tests? ("we do frontend testing with ____ framework"?) I know basically nothing about testing JavaScript apps like this.

The testing framework is built on selenium and all the test cases are written in java. Half of the test cases are just testing the actual page functionality (clicking links, buttons, etc.) and the other half is actual testing with pdf files to produce output and validating the output. The test cases can be run and checked for on Travis CI, Ross had helped me set up the Travis file for it to run and show the results of the test cases on there, please let me know if you have trouble running them!

Also a point to bring up with the test cases is how all of them depend on how long it takes the page to load, so in some cases if out of the blue the page takes too long to load up the test case just fails, despise giving it allotted time for it to wait. I've looked into this issue and it seems to be a pretty common issue with a lot of front-end testing that depend on page load time.

Lastly, in our project final documentation, I believe under the documentation folder we have document files for the front-end and back-end testing. On there it goes more in detail of the pdf files used and what each test case is doing.

Please let me know if you have any more questions!

jeremybmerrill · 2018-07-17T02:54:15Z

@slmendez Thanks! I've got the tests running now, I think.

@dbmarsh: Great, thanks! Very helpful to hear the thinking behind this.

Will let y'all know if I have more questions.

Ross Myers and others added 30 commits March 2, 2018 19:46

Merging in Shirley's test updates....

a01f3a2

Identified issue with auto-detected selections button (a variable ren…

f3315a7

…amed to page_number in all other locations was still referred to as page)...thanks for pointing this out Shirley!

Removed some debugging statements from the code, just tidying up a bi…

12ebb7b

…t...

Small cosmetic code changes....messing around in git right now to see…

a665fd1

… if I can get it to store my credentials....

Made minor changes to the extraction page regarding the shadow colors…

d46d603

… and the size of the sidebars. Both sidebars weren't set to a size of 100% so I changed that size depending on the size of the window the sidebars look of differentl lengths.

Wrapped up the TestExtractionPage, what isn't being tested is mention…

3348c11

…ed why in the comments. Will start to work on the last component test case before moving onto user case scenario testing

Merge remote-tracking branch 'origin/Header_Addition' into Header_Add…

dc705f9

…ition # Conflicts: # .idea/workspace.xml

Cleaning up some code regarding header/footer resize events...will st…

d79da0d

…art on overlap detection on header/footer resize events now.

In process of adding overlap detection for header/footer resize event…

ce4fa6d

…s...

Have first-pass overlap detection in for header/footer resize events...

e994440

Merging in Shirley's test cases....

70f526e

Added on majority of the components tested in Test Preview and Export…

ba55898

… Data page, current issue with a stale element exception being thrown but should be able to fix it.

Merge remote-tracking branch 'origin/Header_Addition' into Header_Add…

7d7387c

…ition

Added on an environment variable to see if I can see the build result…

1410939

…s on travis CI

Update to travis for front end repo

3ba265d

Fixed the travis file to mimic the one in the java repo to see if tha…

b3a3cc8

…t will fix why the build is failing

Added on pathname of the geckodriver in one of the test case files

f2529ba

Added on more commands before the script for the travis file, testing…

427a0f9

… to see if it will work

Changed for all test cases so far to use ChromeDriver as opposed to F…

cbd078a

…irefoxDriver, added on a better solution for calling the pdf file in order to utilize it in the test cases, as well as being able to delete the file once done.

GUI-CLI converter script in progress....need to add support for heade…

38a696f

…r/footer scale, update back-end parsing to accommodate multiple user-drawn rectangles for a single program invocation.....

Added on some beginning test cases for eu-002.pdf, testing for the re…

b767f0b

…gex search disabled button

Added on a new method to test for incorrect inputs for pattern after …

74d544d

…and pattern before

Added on another test case that inputs 2 incorrect inputs to the patt…

9826e87

…ern before and pattern after and checks that there are zero results

Added on more test scenarios of inputting a common word and a correct…

8ee6a71

… input to pattern before and pattern after

mMade another method to eliminate some redundant code and fixed up so…

829cdd7

…me previous test cases

Fixed up and cleaned up some prior test cases that utilize the refres…

65e1e3d

…hpage method, as well as added on the extra data verification for regex inputs. Will move on to the next set of user scenarios

Got rid of more redundant code, updated the pom file, and added on 3 …

f545361

…more test scenarios that use inclusive

Added on test case that tests a text-based image and verifies it's data

5f4a3af

slmendez added 21 commits April 24, 2018 10:52

Adding on the overlap test case for NCHouse2017StatPack file

e20db5f

Getting a long lag again..trying to see if I can get the test case to…

f06405a

… wait until a regex result appears

Change to -x

3004fcb

Went back to old changes before changing the index.html for the regex…

30923fe

…-result row. Shouldn't be causing any problems

FIxed up the overlap test case for the HCHouse2017StatPack, will need…

28fb4e1

… to go back for the last test case for all the pdf files being tested. Moving on to OneStopVotingSiteListNov2012 file

Completed another test case for multiple regex searches/inclusive and…

4cc37b6

… non-inclusive

Fixed up the multiple searches test cases that was failing because it…

e98d95c

… couldn't find the words, moving on to work on Mecklenburg.Majority.pdf file

Added on test cases for Mecklenburg.Majority pdf file. Will wrap up t…

8d6662b

…he last two pdf files, starting to work on the Correspondence_FINAL_SBE pdf file

Added on the test cases for Correspondence_FINAL_SBE. There is an iss…

75a7168

…ue with the overlap test case but I believe it is due to the waits I included for the buttons so I need to go in and change that.

Finishing up the test cases, going to go in and fix the ones with exc…

e35df90

…eptions thrown

Got rid of some of the thread.sleep() for the eu_002 case

80fcdfc

Got rid of thread.sleep() for the onestopvotinglistnov2012

f03ca20

Got rid of NCHouse2017StatPack's thread.sleep() that were being redun…

ef6e044

…dant when there was a better way for it to wait. The sleeps that weren't deleted were kept because of necessity

Fixed up the Feb_9_2016 test file, changed the overlap regex inputs s…

b91743a

…o I wouldn't hit the exception of it not being able to click the regex button because the highlight rectangle was blocking it.

Fixed exception thrown because of an overlay of the drawn rectangle a…

2f90df9

…nd the regex pane in the Correspondence_FINAL_SBE file. Moving on to the file test case that needs to be fixed

Wrapped up the last exception that was being thrown in boron_isotopic

fb1c5db

Updated run.sh file to clone the new repo

5448802

Didn't noticed that I had set the location of the file to my local di…

6bbfc3b

…rectory, should be fixed now

Fixed up the previewandexportdata test case that was throwing errors

e2547fc

Merge branch 'Header_Addition' of https://github.com/redmyers/484_P7_…

84e2ee3

…1-GUI

Changes from the merge

fe92439

jeremybmerrill mentioned this pull request Jun 26, 2018

Regex, Header, and Footer Additions for the GUI #881

Closed

jeremybmerrill mentioned this pull request Sep 22, 2018

"Search by text" (aka regex search) #922

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regex, Header, and Footer Additions for the GUI #873

Regex, Header, and Footer Additions for the GUI #873

slmendez commented Jun 13, 2018

jeremybmerrill commented Jun 15, 2018

jeremybmerrill commented Jul 14, 2018

dbmarsh commented Jul 15, 2018

jeremybmerrill commented Jul 16, 2018

dbmarsh commented Jul 16, 2018

slmendez commented Jul 16, 2018

jeremybmerrill commented Jul 17, 2018

Regex, Header, and Footer Additions for the GUI #873

Are you sure you want to change the base?

Regex, Header, and Footer Additions for the GUI #873

Conversation

slmendez commented Jun 13, 2018

jeremybmerrill commented Jun 15, 2018

jeremybmerrill commented Jul 14, 2018

dbmarsh commented Jul 15, 2018

jeremybmerrill commented Jul 16, 2018

dbmarsh commented Jul 16, 2018

slmendez commented Jul 16, 2018

jeremybmerrill commented Jul 17, 2018