Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plugin operators #574

Closed
wants to merge 5 commits into from
Closed

Add plugin operators #574

wants to merge 5 commits into from

Conversation

elronbandel
Copy link
Member

This is a proposal for a new behaviour that allows to change card operators from the final command such that:
load_dataset("card=cards.wikitq,table_serializer=serializers.table.markdown")
will be loading the wikitq with different table serializer, markdown serializer in this case.

Copy link

codecov bot commented Feb 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (0e2d06e) 87.54% compared to head (05fbded) 87.66%.
Report is 17 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #574      +/-   ##
==========================================
+ Coverage   87.54%   87.66%   +0.12%     
==========================================
  Files          83       84       +1     
  Lines        7104     7143      +39     
==========================================
+ Hits         6219     6262      +43     
+ Misses        885      881       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yoavkatz
Copy link
Member

If I understand there are two changes here.

  1. The ability to add constant fields to the datasets via the recipe (what happens If it overwrites an existing field?)
  2. An operator PlugIn operators that by defaults apply an operator to the stream, but allows changes the operator.

The Plugin operator is a bit tricky to understand (even the example is).


default_operator (List[str]): A list of default operators to be used if no operators are found in the instance.

Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is example is really not clear to me. Why is "processors.to_string" added to field "c"?

@@ -18,7 +20,13 @@
AddFields({"context_type": "table"}),
TruncateTableCells(max_length=15, table="table", text_output="answer"),
TruncateTableRows(field="table", rows_to_keep=50),
SerializeTableAsIndexedRowMajor(field_to_field=[["table", "context"]]),
PlugInOperator(
field="table_serializer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call it "CustomizableOperator" because it's more directly describes what you can do with it.

Then I would change "field" to "overide_operator_field" so it will be more clear what this field means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that people can replace the default operator by any other operator, so it can dramatically change the semantic of the card.

)
stream = recipe()

for instance in stream["train"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is tested in this example? We add a field but don't check or use it.

demo_format="User:{source}\nAgent:{target}\n\n",
model_input_format="{instruction}\n\n{demos}User:{source}\nAgent:",
),
additional_field="test",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whta if people will start adding fields that will not be compatible with different changes.
Perhaps, we will require a prefix like 'custom_field_override_serializer".

@dafnapension
Copy link
Collaborator

A naïve question.. : Since the card is a json file, a simpler, and perhaps better understood way (addressing @yoavkatz concern) would be to first load that json file into a json object, then change that json object, and use the edited version as the input to the recipe?
Or the card must be instantiated via fetch_artifact? can that ad-hoc edited card be added to the catalog via add_card?

…guments

Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants