Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group matches after extension #122

Open
thatbudakguy opened this issue Oct 19, 2020 · 1 comment
Open

group matches after extension #122

thatbudakguy opened this issue Oct 19, 2020 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@thatbudakguy
Copy link
Member

thatbudakguy commented Oct 19, 2020

this plays into #54, since the groups are a little easier to read/parse.

unlike some of the operations for #139, this grouping doesn't mutate edges. in fact, its output might be another kind of complete graph, or it might be a simple list of lists or other non-graph-related structure. apparently this shape is called a star by networkx; see e.g. add_star()

one way this could work using the data= param for networkx's edges():

  1. query for all the edges that connect to the given node using
g.edges([n, None], data=True)

where g is a MultiGraph in which nodes are docs and edges are matches and n is the target doc. this will return all the data associated with the edge, and it will helpfully express the edges with the target node first:

MultiEdgeDataView([(n, other, {"foo": "bar"}), (n, other, {"foo": "bar"}), ...])
  1. group the edges via the sequence bounds in n. in other words, if there are two matches whose Span in n has the same start and end, group them. even if the actual aligned text differs due to spacing, we want to aggregate them all so that we can display the corresponding sequences that aren't in n together.
  2. when we display the group, we'll just use the actual unaligned text of n between both bounds of the Span (or group) once, followed by all the sequences from docs that aren't n.
@thatbudakguy thatbudakguy added the enhancement New feature or request label Oct 19, 2020
@thatbudakguy thatbudakguy added this to the v2.0 milestone Oct 19, 2020
@thatbudakguy thatbudakguy removed this from the v2.0 milestone Feb 19, 2021
@thatbudakguy
Copy link
Member Author

deferring this past 2.0 since it's not trivial.

@thatbudakguy thatbudakguy added this to the v3.0 milestone Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant