Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Search by text" (aka regex search) #922

Open
wants to merge 435 commits into
base: master
Choose a base branch
from
Open

Conversation

jeremybmerrill
Copy link
Member

supersedes #873 by @slmendez et al.

@jazzido, @mtigas would love your eyes on this one. I propose that we aim to do a Beta/Preview release, so some users can take it for a spin and give us feedback. Then see where we're at.

I've done some substantial cleanup, moving raw CSS into SCSS, rewriting some event handlers, putting templates in the right kinds of places, putting the Search by Text toolbar behind a button. I also wrote through all the copy.

The Regex Search I've renamed Search by text so non-regex aware users can use it (but the guts are the same, it still supports regex). One question I'm still pondering is whether the searches created by the regex search should be individually deleteable/movable/editable.

One big win from this is testing: mvn test tests the UI, which is a thing we've needed for a few years.

slmendez and others added 30 commits March 8, 2018 02:17
… Data page, current issue with a stale element exception being thrown but should be able to fix it.
…t of the component testing! What isn't tested, which is just one button is mentioned why on the test case- shouldn't affect the overall functionality and am okay if that button is not tested for; other comments and small things to fix are mentioned in the comments. Will begin the user scenario testing with the first round of pdf files.
…irefoxDriver, added on a better solution for calling the pdf file in order to utilize it in the test cases, as well as being able to delete the file once done.
…r/footer scale, update back-end parsing to accommodate multiple user-drawn rectangles for a single program invocation.....
…ern before and pattern after and checks that there are zero results
…s, have changed it for it to only refresh when the menu bar disappears. Doesn't necessary hurt if the pdf bar or pdf view is not present since it doesn't affect the test cases. Will be updating this method to previous test cases that need it. Also implemented a more correct way to verify tabula is gathering the correct data based on the output
…hpage method, as well as added on the extra data verification for regex inputs. Will move on to the next set of user scenarios
…iple Page Regex Test Should now pass...furthermore, the extraction options have been explicitly baked into the command line option so that this ambiguity *should* not be encountered in the future...should be noted that it is an ongoing issue within Tabula regarding extraction methods when multiple tables are specified....
…...NOTE: still need to manually test the options to double-check the command line unit tests
…ader_Addition

# Conflicts:
#	.idea/workspace.xml
…nning table. The 5 page spanning table scenario is running into an internal error, and after confirming with Ben testing out the scenario as well, this may be something we may have to further look into.
…s (calculate scale in front-end for data efficiency + simpler to generate CLI args)
…ll move on to work on the test case for multiple regex searches
@jazzido
Copy link
Contributor

jazzido commented Sep 27, 2018

Thanks for taking care of this, @jeremybmerrill — I'll try to find some time this weekend to review.

BTW, I went ahead and deleted some files from the root directory (old versions of the PDFBox jar (!), an IDEA .iml file, et cetera)

@jazzido
Copy link
Contributor

jazzido commented Sep 27, 2018

BTW, we should do a merge from master onto this PR. The branch point for this feature is ages old.

@jeremybmerrill
Copy link
Member Author

Thanks. I'll check out merging master back in when I get a chance. (It can't be that old, can it? It's not like there've been many commits in the past year.)

@jazzido
Copy link
Contributor

jazzido commented Sep 27, 2018

(It can't be that old, can it? It's not like there've been many commits in the past year.)

True. I mentioned that anyway because of the list of conflicted files that Github shows:

 .ruby-version
.travis.yml
Gemfile
Gemfile.lock
Jarfile
Jarfile.lock
README.md
lib/tabula_java_wrapper.rb
lib/tabula_workspace.rb
webapp/index.html
webapp/static/css/selectors.css
webapp/static/css/styles.css
webapp/static/js/debug_pdf_view.js
webapp/static/js/library.js
webapp/static/js/pdf_view.js
webapp/static/js/rectangularSelector.js
webapp/static/js/resizableSelection.js
webapp/static/js/tabula.js
webapp/static/sass/selectors.scss
webapp/static/sass/styles.scss
webapp/tabula_settings.rb
webapp/tabula_web.rb 

@jeremybmerrill
Copy link
Member Author

@jazzido : I think I fixed the conflicts. I'm not sure exactly where you were looking, but can you check again?

I'm not sure why only one of the Travis checks passed, but the other failed. Might just be a sporadic erro (the thing that failed is just checking for a button's presence... and that button is there (and it works on my computer).)

@cdcv
Copy link

cdcv commented Feb 17, 2020

Does this branch/PR make it possible to select/position tables relative to the xy coordinates of search text in a PDF document? I recently opened a feature request issue for this feature:
https://github.com/tabulapdf/tabula/issues/1101
... and then I found this PR.
If so, is there a working build of this that we might use? We don't necessarily need to use the GUI if that is holding up merge/use of this functionality - we could use a command-line based version and would love to have one if it exists.
If not, are there other approaches to doing this within Tabula, or other tools that can do this?
Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants