Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of lists or enumerations #604

Open
jonaskindermann opened this issue Apr 9, 2024 · 0 comments
Open

Handling of lists or enumerations #604

jonaskindermann opened this issue Apr 9, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@jonaskindermann
Copy link

jonaskindermann commented Apr 9, 2024

Hello,

we've noticed some problems in translations for specific languages like Finnish and German, especially noticable in enumerations and lists.
Here an example:

The text to translate:

Large language models, such as GPT-3 and BERT, offer several advantages in their premises, including:

1. Improved accuracy and performance in various natural language processing tasks, such as question answering, text generation, and sentiment analysis.
2. Ability to understand context and nuance in language, making them more versatile and adaptable to different types of text.
3. Capacity to handle longer and more complex inputs, allowing for more sophisticated and nuanced responses.
4. Ability to learn and adapt to new concepts and language patterns, making them more versatile and useful in a variety of applications.
5. Potential for significant advancements in fields such as artificial intelligence, cognitive science, and linguistics, as these models provide insights into the structure and function of human language.

So we tried to translate this text to Finnish and German and here are the results:
Finnish:

Suuret kielimallit, kuten GPT-3 ja BERT, tarjoavat useita etuja tiloissaan.

1.1. Parannettu tarkkuus ja suorituskyky erilaisissa luonnollisissa kielenkäsittelytehtävissä, kuten kyselyvastauksessa, tekstintuotannossa ja tunneanalyysissä.
Kyky ymmärtää kontekstia ja vivahteita kielessä, mikä tekee niistä monipuolisempia ja sopeutuvaisempia erilaisiin tekstityyppeihin.
3.3.3. Kyky käsitellä pidempiä ja monimutkaisempia syöttöjä, mikä mahdollistaa kehittyneempiä ja vivahteellisempia vastauksia.
Kyky oppia ja sopeutua uusiin käsitteisiin ja kielikuvioihin, mikä tekee niistä monipuolisempia ja hyödyllisempiä monissa sovelluksissa.
5.5. Mahdollisuus merkittävään kehitykseen esimerkiksi tekoälyn, kognitiivisen tieteen ja kielitieteen aloilla, koska nämä mallit antavat tietoa ihmisen kielen rakenteesta ja toiminnasta.

Noticable is how the formatting is completely gone and the enumeration is not working correctly.

German:

Große Sprachmodelle wie GPT-3 und BERT bieten in ihren Räumlichkeiten mehrere Vorteile, darunter:

ANHANG Verbesserte Genauigkeit und Leistung in verschiedenen natürlichen Sprachverarbeitungsaufgaben, wie etwa Fragebeantwortung, Textgenerierung und Stimmungsanalyse.
2. Fähigkeit, Kontext und Nuance in der Sprache zu verstehen, so dass sie vielseitiger und an verschiedene Arten von Texten anpassbar.
3. Kapazität, um längere und komplexere Eingaben zu handhaben, wodurch anspruchsvollere und nuancierte Antworten möglich sind.
4. Fähigkeit, neue Konzepte und Sprachmuster zu lernen und anzupassen, so dass sie vielseitiger und nützlicher in einer Vielzahl von Anwendungen.
5. Potenzial für signifikante Fortschritte in Bereichen wie künstliche Intelligenz, kognitive Wissenschaft und Linguistik, da diese Modelle Einblicke in die Struktur und Funktion der menschlichen Sprache bieten.

As you can see, for the German translation the first point of the enumeration is replaced completely with "ANHANG", which does not resemble the meaning of a first point in a enumeration.

We are not using auto-detect but rather are adding the according (correct) language key to the request. The installation was made using the docker setup with no additional configuration.

Kind regards

@github-actions github-actions bot added the enhancement New feature or request label Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant