New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add in way to handle synonyms #37

Open

shjohnson wants to merge 5 commits into main from HOTT-4489-synonyms

Contributor

shjohnson commented Mar 1, 2024 •

edited

Jira link

https://transformuk.atlassian.net/browse/HOTT-4489

What?

This is a way of adding synonyms into the training data when generating the model. It will come after the search references and enrich the data so when we allow the model against the querying we have more data to match to commodity numbers

shjohnson commented

View reviewed changes

training/prepare_data.py Show resolved Hide resolved

shjohnson force-pushed the HOTT-4489-synonyms branch 3 times, most recently from b9e8d2e to b1cca0d Compare

March 6, 2024 10:39

alexdesi reviewed

View reviewed changes

tests/test_synonym_expander.py Show resolved Hide resolved

alexdesi reviewed

View reviewed changes

training/enhance_data/enhance_data.py Outdated

		from training.synonym.synonym_expander import SynonymExpander


		class EnhanceData:

Contributor

alexdesi Mar 6, 2024

"data" (in EnhanceData) sounds quite generic,
what about
EnhanceDescriptions or
EnrichDescriptions
?

Contributor Author

shjohnson Mar 7, 2024

Yeah thats fair, will have a think

Contributor Author

shjohnson Mar 13, 2024

I still need to think about this actually 🤔

alexdesi reviewed

View reviewed changes

training/synonym/synonym_file_handler.py Outdated Show resolved Hide resolved

alexdesi reviewed

View reviewed changes

training/synonym/synonym_file_handler.py Show resolved Hide resolved

alexdesi reviewed

View reviewed changes

training/synonym/synonym_file_handler.py Show resolved Hide resolved

alexdesi reviewed

View reviewed changes

Contributor

alexdesi left a comment

It's really good,
the expansion logic is not clear to me, to be honest,
(the code is clear, is why we do it that way is not clear)
I've Just added a few minor comments

shjohnson force-pushed the HOTT-4489-synonyms branch 4 times, most recently from e6e94ac to 50c6a11 Compare

March 8, 2024 16:10

shjohnson commented

View reviewed changes

training/enhance_data/enhance_data.py Outdated Show resolved Hide resolved

shjohnson commented

View reviewed changes

training/synonym/synonym_expander.py Outdated Show resolved Hide resolved

shjohnson added 4 commits

March 13, 2024 16:34


          Add in way to handle synonyms


          enhance data improvements and fixes

cc05ff9


          tests improvements

3575a6c


          fix dependencies

b37ff1e

shjohnson force-pushed the HOTT-4489-synonyms branch 2 times, most recently from 8a120bf to 0e12b73 Compare

March 13, 2024 16:59


          Code cleanup

0d14afd

shjohnson force-pushed the HOTT-4489-synonyms branch from 0e12b73 to 0d14afd Compare

March 18, 2024 16:53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment