Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New langs #327

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

New langs #327

wants to merge 8 commits into from

Conversation

kareglazie
Copy link

Issue #324

Copy link
Owner

@pemistahl pemistahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • There is a copy of the accuracy reports directory in src. Please remove it. This is not where it should be located.
  • Please restrict the number of lines in the test data files to 1000 lines per file. This is totally sufficient to test the accuracy of a language model.

.gitignore Outdated
@@ -15,6 +15,7 @@
/pkg/
/target/
**/*.rs.bk
src/lingua_notes.ipynb
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this line. It refers to a file that is not available in this PR.

Cargo.toml Outdated
@@ -17,7 +17,7 @@ members = ["language-models/*"]

[package]
name = "lingua"
version = "1.5.0"
version = "1.5.1"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New language models satisfy a jump to version 1.6.0.

Cargo.toml Outdated
Comment on lines 142 to 157
lingua-amharic-language-model = { path = "language-models/am", version = "1.1.0", optional = true }
lingua-burmese-language-model = { path = "language-models/my", version = "1.1.0", optional = true }
lingua-chechen-language-model = { path = "language-models/ce", version = "1.1.0", optional = true }
lingua-kyrgyz-language-model = { path = "language-models/ky", version = "1.1.0", optional = true }
lingua-malayalam-language-model = { path = "language-models/ml", version = "1.1.0", optional = true }
lingua-nepali-language-model = { path = "language-models/ne", version = "1.1.0", optional = true }
lingua-pashto-language-model = { path = "language-models/ps", version = "1.1.0", optional = true }
lingua-sanskrit-language-model = { path = "language-models/sa", version = "1.1.0", optional = true }
lingua-sinhala-language-model = { path = "language-models/si", version = "1.1.0", optional = true }
lingua-sindhi-language-model = { path = "language-models/sd", version = "1.1.0", optional = true }
lingua-tatar-language-model = { path = "language-models/tt", version = "1.1.0", optional = true }
lingua-tajik-language-model = { path = "language-models/tg", version = "1.1.0", optional = true }
lingua-turkmen-language-model = { path = "language-models/tk", version = "1.1.0", optional = true }
lingua-uzbek-language-model = { path = "language-models/uz", version = "1.1.0", optional = true }
lingua-lao-language-model = { path = "language-models/lo", version = "1.1.0", optional = true }
lingua-khmer-language-model = { path = "language-models/km", version = "1.1.0", optional = true }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a language model is completely new, then the initial version should be 1.0.0.

Cargo.toml Outdated
Comment on lines 199 to 201
"welsh", "xhosa", "yoruba", "zulu", "chechen", "amharic", "burmese",
"kyrgyz", "nepali", "malayalam", "pashto", "sanskrit", "sinhala",
"tatar", "tajik", "turkmen", "uzbek", "sindhi", "lao", "khmer"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort all languages alphabetically, here and everywhere else in the project.

Comment on lines 856 to 857
// Some(WhatlangLanguage::Nob) => Some(Language::Bokmal),
Some(WhatlangLanguage::Nob) => Some(Language::Norwegian),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no entry for Norwegian in the Language enum anymore. So this will probably not compile at all. Please remove this line again and replace it with Bokmal.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello again, I've made some changes according to your review a while ago, was not sure if you'd received the notification

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I haven't had time so far to continue working on my open source projects. Been busy with my job and family. But I did receive your notification and will look at it again as soon as I find the time.

@kareglazie
Copy link
Author

Thanks a lot for your review! I've made corrections to comply with your remarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants