-
-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add preserve original on ascii folding filter. #2126
base: main
Are you sure you want to change the base?
Conversation
0f95465
to
90b9059
Compare
ab593ca
to
36d585a
Compare
Codecov Report
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. @@ Coverage Diff @@
## main #2126 +/- ##
==========================================
+ Coverage 94.37% 94.38% +0.01%
==========================================
Files 321 319 -2
Lines 60821 60791 -30
==========================================
- Hits 57401 57379 -22
+ Misses 3420 3412 -8
|
if !self.token_mut().text.is_ascii() { | ||
// ignore its already ascii | ||
to_ascii(&self.tail.token().text, self.buffer); | ||
text_has_changed = to_ascii(&self.tail.token().text, self.buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can text_has_changed == false
happen here even though the is_ascii
test above already failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checked the to_ascii
and as you can expect it will not change a full ascii text... I will fix that, thanks.
Same as lucene filter: https://lucene.apache.org/core/6_4_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html
There is a minor difference between the two filters with
preserve_original
set totrue
and when the original token is different from the folded token: the lucene filter first emits the folded token and then the original token, whereas tantivy first emits the original token and then the folded one.