Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Building Diacritics #137

Open
Hamed1Hamed opened this issue Jan 25, 2024 · 0 comments
Open

[BUG] Building Diacritics #137

Hamed1Hamed opened this issue Jan 25, 2024 · 0 comments
Assignees
Labels

Comments

@Hamed1Hamed
Copy link

Hamed1Hamed commented Jan 25, 2024

Describe the bug
Unable to add diacritics using CAMeL Tools.
To Reproduce

Running the addDiacritics in Diacritics.py

Provide any Python/Shell scripts as code blocks.

Diacritics.py

from camel_tools.tagger.default import DefaultTagger
from camel_tools.disambig.bert import BERTUnfactoredDisambiguator

def addDiacritics(message):
bertd = BERTUnfactoredDisambiguator.pretrained('msa')
tagger = DefaultTagger(bertd, 'diac')

diacritized_paragraphs = []
paragraphs = message.split("\n")
for paragraph in paragraphs:
    sentences = paragraph.split(". ")
    diacritized_sentences = []
    for sentence in sentences:
        words = sentence.split()
        diacritized_words = tagger.tag(words)
        diacritized_sentence = ' '.join(diacritized_words)
        diacritized_sentences.append(diacritized_sentence)
    diacritized_paragraph = '. '.join(diacritized_sentences)
    diacritized_paragraphs.append(diacritized_paragraph)

diacritized_message = '\n'.join(diacritized_paragraphs)
with open('output.txt', 'w', encoding='utf-8') as f:
    f.write(diacritized_message)

# print the number of characters in the output
print(f"Number of characters in the output: {len(diacritized_message)}")

#main.py

from NonDiacritics import removeDiacritics
from Diacritics import addDiacritics

sentence = """ بسم الله الرحمن الرحيم"""
#removeDiacritics(sentence)
addDiacritics(sentence)

Expected behavior
Output: it is supposed to print the sentence in the output.txt file with diacritics.

Screenshots
'C:\Users\UserName\AppData\Roaming\camel_tools\data\disambig_bert_unfactored\msa\default_config.json'
GITHUB

Desktop (please complete the following information):

  • OS [e.g. Windows, macOS, Linux, etc] along with OS version: Win 11; latest update

  • Python version: Python 3.9.18

  • CAMeL Tools version as well as installation source (pip, conda, source). If installed from source, specify which branch [e.g. master] and/or commit hash.
    Successfully installed camel-tools-1.5.2

Additional context

The last time I used camel-tools to build diacritics was something around August, and this code was working properly with my needs. However, it is no longer working.

@Hamed1Hamed Hamed1Hamed changed the title [BUG] Title of new issue... [BUG] Building Diacritics Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants