Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a new normalizer to normalize œ to oe and æ to ae #278

Merged
merged 9 commits into from May 21, 2024

Conversation

Soham1803
Copy link
Contributor

Pull Request

Related issue

Fixes #268

What does this PR do?

  • Creates a new normalizer ae_oe_normalizer
  • normalizes œ and Œ to oe, æ and Æ to ae.

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

@Soham1803 Soham1803 changed the title Adds a new normalizer to normalize œ to oe and æ to ae. Adds a new normalizer to normalize œ to oe and æ to ae Apr 6, 2024
Comment on lines 5 to 9
// Make a small documentation of the specialized Normalizer like below.
/// <Script/Language> specialized [`Normalizer`].
///
/// This Normalizer uses [`<UsedLibraryToNormalize>`] internally to normalize the provided token.
/// <OptionalAdditionnalExplanations>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please modify the documentation in order to explain the implementation, please?

Comment on lines 38 to 42
// Include the newly implemented Normalizer in the tokenization pipeline:
// - change the name of the file `dummy_example.rs` to `dummy.rs`
// - import module by adding `mod dummy;` (filename) in `normalizer/mod.rs`
// - Add Normalizer in `NORMALIZERS` in `normalizer/mod.rs`
// - check if it didn't break any test or benhchmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalizer is implemented, however, you didn't import it in the mod.rs file and added in NORMALIZERS, you should follow the bellow instructions and remove them from the file:

Suggested change
// Include the newly implemented Normalizer in the tokenization pipeline:
// - change the name of the file `dummy_example.rs` to `dummy.rs`
// - import module by adding `mod dummy;` (filename) in `normalizer/mod.rs`
// - Add Normalizer in `NORMALIZERS` in `normalizer/mod.rs`
// - check if it didn't break any test or benhchmark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just want to confirm one thing, should it be added in NORMALIZERS or LOSSY_NORMALIZERS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Soham1803,

deeply sorry for the delay,
LOSSY_NORMALIZER is better

charabia/src/normalizer/ae_oe_normalizer.rs.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs.rs Outdated Show resolved Hide resolved
@ManyTheFish
Copy link
Member

Hello @Soham1803,
Thank you for your PR, I made several change requests before accepting to merge it,

let me know if you need more informations

@Soham1803
Copy link
Contributor Author

Hey @ManyTheFish I have made the requested changes can you please check and let me know. Thank You!

ManyTheFish
ManyTheFish previously approved these changes Apr 29, 2024
Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Soham1803

Thank you for the PR,
nice work,

bors merge

meili-bors bot added a commit that referenced this pull request Apr 29, 2024
278: Adds a new normalizer to normalize œ to oe and æ to ae r=ManyTheFish a=Soham1803

# Pull Request

## Related issue
Fixes #268

## What does this PR do?
- Creates a new normalizer *ae_oe_normalizer*
- normalizes `œ` and `Œ`  to `oe`,  `æ` and `Æ` to `ae`. 

## PR checklist
Please check if your PR fulfills the following requirements:
- [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [x] Have you read the contributing guidelines?
- [x] Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!


Co-authored-by: Soham <mastpanchalsoham@gmail.com>
@ManyTheFish
Copy link
Member

Hello @Soham1803,

The tests don't seem to work. Could you fix them?

Thank you

Copy link
Contributor

meili-bors bot commented Apr 29, 2024

Build failed:

  • tests

@Soham1803
Copy link
Contributor Author

Hey @ManyTheFish, I'm trying to run the tests on my local machine but got some issues with jemalloc-sys in .cargo. The last commit I made is based on the logs of test run I read on Github. Its good if the tests pass now, or else I will make another commit after being assured all the tests are passed.

Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes that I requested should fix the errors to run the tests

charabia/src/normalizer/ae_oe_normalizer.rs Outdated Show resolved Hide resolved
charabia/Cargo.toml Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs Outdated Show resolved Hide resolved
charabia/src/normalizer/ae_oe_normalizer.rs Outdated Show resolved Hide resolved
@Soham1803
Copy link
Contributor Author

Hey @ManyTheFish sorry for the delay. Thanks for your suggested changes. The last commit passed all the tests on local successfully, after many different tryouts. Made some major changes to the Normalizer. Suggest me any changes if necessary before the merge.

Thank You!

@curquiza
Copy link
Member

curquiza commented May 6, 2024

Hello @Soham1803

@ManyTheFish is on Holidays and will review your PR when coming back

In the meantime, can you fix the Rustfmt tests? 😊

Thanks again for your PR

@Soham1803
Copy link
Contributor Author

Hello @curquiza, dealt with the Rustfmt tests. I guess all tests are passed. Now, just waiting for @ManyTheFish to suggest any required changes before the merge.

Thank You! 😊

Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Thank you @Soham1803 for your work. This should be part of the Meilisearch v1.9.0 release!

bors merge

Copy link
Contributor

meili-bors bot commented May 21, 2024

Build succeeded:

@meili-bors meili-bors bot merged commit 6b270fc into meilisearch:main May 21, 2024
4 checks passed
@Soham1803
Copy link
Contributor Author

Nice to have you back @ManyTheFish. Thanks for accepting my PR and the help you provided. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalize "œ" / "æ" into "oe" / "ae"
3 participants