Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent treatment of attributes #70

Open
laurachiticariu opened this issue Jun 22, 2022 · 2 comments · May be fixed by #141
Open

Consistent treatment of attributes #70

laurachiticariu opened this issue Jun 22, 2022 · 2 comments · May be fixed by #141
Labels
enhancement New feature or request med-priority

Comments

@laurachiticariu
Copy link
Collaborator

laurachiticariu commented Jun 22, 2022

Currently attributes are allowed to be renamed and turned off on the Sequence node. We should enable this for all nodes.

Here is the desired behavior.

Dictionary (Mapped terms = OFF) and Literal:

  • A single attribute, named as the node by default
  • The attribute can be renamed by the user
  • The attribute cannot be removed

Dictionary (Map terms = ON)

  • Two attributes: one named as the node by default, and the second attribute named Mapped Term by default
  • Both attributes can be renamed by the user (this is feature request [Feature Request] Provide edit option for headers under Map Terms #127)
  • The first attribute cannot be removed. The second attribute (corresponding to the Mapped Term) can be removed.

Regex:

  • One attribute for each capturing group
  • All but one attribute can be removed
  • All attributes can be renamed
  • Example:
    • \$(\d+(\.\d+)?)\s+(billion) that captures text such as $3.4 billion should have 4 attributes by default:
    • Attribute 1 (covers the entire match and corresponds to group 0 of the regex, which is the entire match): $3.4 billion
    • Attribute 2 (covers group 1 of the regex) and matches 3.4
    • Attribute 3 (covers group 2 of the regex) and matches .4
    • Attribute 4 (covers group 3 of the regex) and matches billion

Sequence:

  • One attribute for each of the input nodes of the sequence
  • All but one attribute can be removed
  • All attributes can be renamed
  • Example 1: (<Metric.Metric>)<Token>{0,1}(<Preposition.Preposition>)<Token>{0,2}(<Division.Division>) that captures text such as revenue from the Global Technology Services should have 4 attributes by default (one for the entire match, and one for each open parenthesis):
    • Attribute 1 (covers the entire match and corresponds to group 0 of the sequence, which is the entire match): revenue from the Global Technology Services
    • Attribute 2 (covers group 1 of the sequence) and matches revenue
    • Attribute 3 (covers group 2 of the sequence) and matches from
    • Attribute 4 (covers group 3 of the sequence) and matches Global Technology Services
  • Example 2: (<Metric.Metric><Token>{0,1}<Preposition.Preposition>)<Token>{0,2}(<Division.Division>) that captures text such as revenue from the Global Technology Services should have 3 attributes by default (one for the entire match, and one for each open parenthesis - 2 of them):
    • Attribute 1 (covers the entire match and corresponds to group 0 of the sequence, which is the entire match): revenue from the Global Technology Services
    • Attribute 2 (covers group 1 of the sequence) and matches revenue from
    • Attribute 3 (covers group 3 of the sequence) and matches Global Technology Services

Union

  • Same attributes as those of the input nodes (the union cannot be created unless the input nodes have the same schema - the same number of attributes and the same names for the attributes)
  • All but one attribute can be removed
  • All attributes can be renamed

Consolidate

  • Same attributes as those of the input node
  • All but one attribute can be removed
  • All attributes can be renamed

Filter

  • Same attributes as those of the primary input node
  • All but one attribute can be removed
  • All attributes can be renamed
@marthacryan
Copy link
Collaborator

@laurachiticariu Thank you for writing up all these examples / explanations! I have a couple questions:

Consolidate

  • Same attributes as those of the input node
  • All but one attribute can be removed
  • All attributes can be renamed

When you say "the input node" - should this node type only have one input node? Or should I combine the attributes of all input nodes?

Filter

  • Same attributes as those of the primary input node
  • All but one attribute can be removed
  • All attributes can be renamed

Could you explain what the "primary input node" is?

@laurachiticariu
Copy link
Collaborator Author

  1. The Consolidate node can only have a single input node. That's a constraint on the Consolidate node. I am not sure if the UI enforces this constraint now (if not yet, then it should).
  2. The primary node is the input from which we include or exclude tuples. The one that shows at the top of the Filter dialog, right under the Exclude/Include dropbox, e.g., SentenceBoundary in the filter sample flow:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request med-priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants