Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuse, split, MultiStreamScoreMean, and merge all #767

Merged
merged 8 commits into from May 12, 2024

Conversation

dafnapension
Copy link
Collaborator

@dafnapension dafnapension commented Apr 13, 2024

of each instance
per issue #764

"""

origins: List[SourceOperator]
origins: Dict[str, SourceOperator]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
origins: Dict[str, SourceOperator]
origins: Union[List[SourceOperator], Dict[str, SourceOperator]]

I think we should allow also anonymus sources merging which are merged without given name per source

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 67 to 73
if "past_groups" in instance:
instance["past_groups"] = (
instance["group"] + "/" + instance["past_groups"]
)
else:
instance["past_groups"] = instance["group"]
instance["group"] = origin_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "past_groups" in instance:
instance["past_groups"] = (
instance["group"] + "/" + instance["past_groups"]
)
else:
instance["past_groups"] = instance["group"]
instance["group"] = origin_name
group = origin_name + "/" + instance["group"]
else:
group = origin_name
instance["group"] = group

Why not like this? more simple and dont require additional variable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that code that reads the "group" (e.g, the splitter) will not need to check if any past is glued to this group, and if glued -- peel it off. Only if the consuming code is explicitly interested in the past, it will look into "past_groups", and do whatever it wants.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed per your request, @elronbandel. We will know what to do if the past haunts us..

Comment on lines 79 to 86
if "group" in instance:
if "past_groups" in instance:
instance["past_groups"] = (
instance["group"] + "/" + instance["past_groups"]
)
else:
instance["past_groups"] = instance["group"]
instance["group"] = origin_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here dont think past_groups are necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same thing: trying to work here, in order to save work from the consure (e.g. Splitter)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here too

Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments, in general:

  1. I think we should have only one field "group" containing all the grouping information.
  2. I think we should test also hirarchial group naming works within the unittests
  3. I think we should allow also Fusion of streams without names
  4. We should test the tests actually reutrn score per group with the existing mecahinsm (i linked the exact line in the metric pipeline in the related issue)

@dafnapension
Copy link
Collaborator Author

dafnapension commented Apr 14, 2024

I added some comments, in general:

  1. I think we should have only one field "group" containing all the grouping information.
  2. I think we should test also hirarchial group naming works withing the unittests
  3. I think we should allow also Fusion of streams without names
  4. We should test the tests actually reutrn score per group with the existing mecahinsm (i linked the exact line in the metric pipeline in the related issue)

OK. I just thought to save work from the consuming code, but accept your judgement, of course.

Exactly this line that you linked to in the issue, reads:
SplitByValue(["group"])

As it is, it can not work when group carries the whole history, unless you would like indeed to split them all by the whole history, and not just by the most recent fuse.

@dafnapension dafnapension changed the title name the sources for fusion and document that name in field 'group' name the sources for fusion and document that name in field 'group' , yet allow anonymous sources too Apr 15, 2024
@dafnapension dafnapension force-pushed the fuse_named_resources branch 5 times, most recently from e479174 to 210b4d5 Compare May 5, 2024 15:26
Copy link

codecov bot commented May 5, 2024

Codecov Report

Attention: Patch coverage is 96.19048% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 92.14%. Comparing base (5783d76) to head (210b4d5).
Report is 42 commits behind head on main.

❗ Current head 210b4d5 differs from pull request most recent head e829415. Consider uploading reports for the commit e829415 to get more accurate results

Files Patch % Lines
src/unitxt/fusion.py 93.02% 3 Missing ⚠️
src/unitxt/operators.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #767      +/-   ##
==========================================
+ Coverage   91.48%   92.14%   +0.65%     
==========================================
  Files         100      103       +3     
  Lines       10169    10614     +445     
==========================================
+ Hits         9303     9780     +477     
+ Misses        866      834      -32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dafnapension dafnapension force-pushed the fuse_named_resources branch 2 times, most recently from b2203d4 to 7210ed8 Compare May 8, 2024 08:03
@@ -1560,6 +1560,48 @@ def process(self, multi_stream: MultiStream) -> MultiStream:
return MultiStream(result)


class SplitByGroup(MultiStreamOperator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class SplitByGroup(MultiStreamOperator):
class SplitByNestedGroup(MultiStreamOperator):
field: str = "group"

@dafnapension dafnapension force-pushed the fuse_named_resources branch 2 times, most recently from 7da0bfc to fceaeca Compare May 10, 2024 19:13
@dafnapension dafnapension changed the title name the sources for fusion and document that name in field 'group' , yet allow anonymous sources too fuse, split, MultiStreamScoreMean, and merge all May 10, 2024
@dafnapension
Copy link
Collaborator Author

Hi @elronbandel , I hope this is more in the direction you had in mind.

splits: List of splits to include. If None, all splits are included.
origins: Dict of named SourceOperator objects (each to yield a MultiStream), or a list thereof
splits: List of splits (stream_names) to include, over all input multistreams. If None, all splits are included.
max_instances_per_origin_split: Number of instances to take from each input split of each input multistream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! the old name was disambgious!

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
… the operation done

Signed-off-by: dafnapension <dafnashein@yahoo.com>
Signed-off-by: dafnapension <dafnashein@yahoo.com>
…of just one stream - it returns it as is. No need to generate a nested structure for just one leaf.

Signed-off-by: dafnapension <dafnashein@yahoo.com>
@dafnapension dafnapension merged commit 1b1f6d4 into main May 12, 2024
7 checks passed
@dafnapension dafnapension deleted the fuse_named_resources branch May 12, 2024 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants