Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for capturing group in tag inclusion. #21

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mgao6767
Copy link

This PR allows the use of capturing group when using regular expression to determine item inclusion by tags.
If tag_include_re contains capturing group, the tags will be the concatenation of captured substrings.

For example:

  • In Zotero, I have tags of "-kerko-Topic 1", "-kerko-Topic 2".
  • In config.toml, I set tag_include_re to "-kerko-(.*)". Enable kerko.facets.tag.
  • In the web, facets will be "Topic 1" and "Topic 2", instead of "-kerko-Topic 1", "-kerko-Topic 2".

The changes pass all tests. Works as expected.

@davidlesieur
Copy link
Member

Sounds like a useful feature, will review it! Even though the tests pass, one should not rely on them too much when working on extractors, as there are currently no specific tests for covering the extractors. Of course you are welcome to write tests for this extractor, should you feel inclined to. 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not seem relevant to the requested feature.

match = self.include.match(tag)
if match:
# Check if there are capturing groups
if len(match.groups()) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if match.groups(): would be a simpler way to express the same condition.

if match:
# Check if there are capturing groups
if len(match.groups()) > 0:
tag = ''.join(match.groups())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this could be problematic if the regular expression happens to have multiple or nested groups. It looks like the user would need a way to tell the extractor which group to use (group index, or group name, for regular expressions in the form of (?P<name>...)).

@davidlesieur davidlesieur added the enhancement New feature or request label Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants