Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex file selection duplicates OS globbing/regex functionality #67

Open
ghost opened this issue Jun 14, 2017 · 3 comments
Open

Regex file selection duplicates OS globbing/regex functionality #67

ghost opened this issue Jun 14, 2017 · 3 comments

Comments

@ghost
Copy link

ghost commented Jun 14, 2017

This issue report is based on using this JAR.

The regex file selection functionality in norma adds complexity without adding value.

That is because this functionality duplicates what is already offered by shells on modern operating systems, such as GNU Bash or Windows PowerShell. Such shells already feature globbing- and regular expression-based file selection.

For example:

$ java -jar norma-0.5.0-SNAPSHOT-jar-with-dependencies.jar --project publicPapers  --fileFilter '.*/(.*).pdf' --makeProject '(\1)/fulltext.pdf'

could more naturally be expressed in Bash with a glob like:

$ java -jar norma-0.5.0-SNAPSHOT-jar-with-dependencies.jar --infiles publicPapers/*.pdf --makeProject 'fulltext.pdf'

If one's matching criteria require a regex rather than just a glob, this is also available with standard OS tools:

$ java -jar norma-0.5.0-SNAPSHOT-jar-with-dependencies.jar --infiles "$(find publicPapers -maxdepth 1 -type f -iregex '.*/\(pub\|phm\).*.pdf')" --makeProject 'fulltext.pdf'

Removing regex CLI functionality from norma would provide the following benefits:

@ghost ghost added the architecture label Jun 14, 2017
@petermr
Copy link
Member

petermr commented Jun 15, 2017

This is misconceived and unnecessary. This is a regex and not a glob and the capture groups re used to rename parts of the tree

@petermr petermr closed this as completed Jun 15, 2017
@ghost ghost changed the title Globbing duplicates OS functionality Regex duplicates OS globbing/regex functionality Jun 15, 2017
@ghost ghost changed the title Regex duplicates OS globbing/regex functionality Regex file selection duplicates OS globbing/regex functionality Jun 15, 2017
@ghost
Copy link
Author

ghost commented Jun 15, 2017

@petermr wrote:

This is a regex and not a glob

Strictly speaking, that is true, and I have amended the wording accordingly. However, a glob would be adequate in many cases, including the invocations described at http://discuss.contentmine.org/t/extracting-data-from-tilburg-funnel-plot-diagrams/386 , and I have now illustrated that in my opening comment above.

and the capture groups re used to rename parts of the tree

In all the examples norma invocation I have so far observed, this would be better handled as described in my opening comment.

This is misconceived and unnecessary.

Surely these benefits, stated in my opening comment, are neither misconceived, nor undesirable:

- simpler documentation
- easier learning curve
- reduced code complexity
- easier maintenance.

Therefore re-opening, as unresolved.

@ghost ghost reopened this Jun 15, 2017
@ghost
Copy link
Author

ghost commented Jun 15, 2017

Related: ContentMine/cproject#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant