Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: More informed approach to multi-linguality in Unitxt #758

Open
elronbandel opened this issue Apr 10, 2024 · 0 comments
Open
Assignees
Labels
ease-of-use Readability, requires redundant work, unintuitive etc.

Comments

@elronbandel
Copy link
Member

elronbandel commented Apr 10, 2024

Currently a specific template in unitxt have few versions for different languages.

For example:
English sentiment template:

template = InputOutputTemplate(input_format="Classify the sentiment of this text: {text}")

Deutch sentiment template:

template = InputOutputTemplate(input_format="Classificeer het sentiment van deze tekst: {text}")

The issue is that we have many templates for different languages that logically say the same thing, moreover, we need also formats for each language and trust our users to change all the different aspects of the recipe to the right artifact with the correct language. I want to suggest a simple solution that will enable to give the recipe and argument language=deutch and the adjusment of the template format etc will be done automatically.

My suggestion is to create a new class MultiString that have strings for different languages:

input_format = MultiString(
     english="Classify the sentiment of this text: {text}",
     deutch="Classificeer het sentiment van deze tekst: {text}",
    )
 template = InputOutputTemplate(input_format=input_format)

And lastly the usage will be with a context manager:

with set_language("deutch",  when_not_exist="english"):
       # here is the code that will be affected

And everything within that context manager will use the requested language set up in the MultiString.

This will allow us to add a general variable to unitxt recipe prompting_language=english.

@elronbandel elronbandel added the ease-of-use Readability, requires redundant work, unintuitive etc. label Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ease-of-use Readability, requires redundant work, unintuitive etc.
Projects
None yet
Development

No branches or pull requests

2 participants