Normalization failed / Invalid start of grapheme sequence Error While training the tesseract model #345

Sanketnarkhede-10 · 2023-06-07T03:55:12Z

Normalization failed for string 'ଜୀବନକୁ ନିବିଡ଼ ଭାବେ ଏକନ୍ୱିତ କରିଛନ୍ତି'
Invalid start of grapheme sequence:D=0xb71
Normalization failed for string 'ପରମ୍ପରାକୁ ଅବଲମ୍ୱନ କରିଛନ୍ତି, ସେତିକି ମଧ୍ୟ'
Invalid start of grapheme sequence:M=0xb48
Normalization failed for string 'ଦ୍ୱୈତ ରୂପରେ ଦେଖିଥିଲେ, ଏଠାରେ ପୁରୁଷ'
Invalid start of grapheme sequence:M=0xb47
Normalization failed for string 'ତାଙ୍କ ହୃଦୟ ବିଭୋର ହୋଇଛି ସମ୍ୱେଦନଶୀଳତାରେ;'
Invalid start of grapheme sequence:D=0xb71

I'm getting this error while training the tesseract ocr model for Oriya language , please help me to resolve this issue .
I'm attaching the ground truth files .

Training on tesseract 4.1.1 :
tesseract 4.1.1
leptonica-1.82.0

ocr_training.zip

stweil · 2023-06-07T04:29:31Z

Try to shorten those strings in your training data until the error messages disappear, then check what was wrong with them.

And please use the latest Tesseract version 5.3.1 instead of 4.1.1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization failed / Invalid start of grapheme sequence Error While training the tesseract model #345

Normalization failed / Invalid start of grapheme sequence Error While training the tesseract model #345

Sanketnarkhede-10 commented Jun 7, 2023 •

edited

stweil commented Jun 7, 2023 •

edited

Normalization failed / Invalid start of grapheme sequence Error While training the tesseract model #345

Normalization failed / Invalid start of grapheme sequence Error While training the tesseract model #345

Comments

Sanketnarkhede-10 commented Jun 7, 2023 • edited

stweil commented Jun 7, 2023 • edited

Sanketnarkhede-10 commented Jun 7, 2023 •

edited

stweil commented Jun 7, 2023 •

edited