Skip to content

Incorrect detection of sentence boundaries, if last sentence missing eos symbol for trf model #13356

Discussion options

You must be logged in to vote

Hi!

In this pretrained pipeline, the sentence segmentation is actually done by the parser, and the model was mostly trained on texts with correct punctuation. So unfortunately this type of occassional error is unavoidable.

If you'd like to have more predictable behaviour, you can use the sentencizer instead, which is a more simple rule-based component that splits sentences on punctuation like ., ! or ?.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / parser Feature: Dependency Parser perf / accuracy Performance: accuracy
2 participants
Converted from issue

This discussion was converted from issue #13351 on February 27, 2024 12:58.