Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringComparison.InvariantCulture behavior change between .NET Core 3.1 and .NET 5 #44687

Closed
Zintom opened this issue Nov 14, 2020 · 9 comments
Closed
Labels
area-System.Globalization untriaged New issue has not been triaged by the area owner

Comments

@Zintom
Copy link

Zintom commented Nov 14, 2020

Description

Using StringComparison.InvariantCulture on string.LastIndexOf(string, comparison) returns different results in .NET 5 compared to .NET Core 3.1 when looking for the index of a unicode character.

Witness the following code:

string specialChar = "\u007f";
string testString = "hello" + specialChar + "world";

Debug.Assert(testString.LastIndexOf(specialChar) == 5);

On .NET Core 3.1, this assertion is true, whereas in .NET 5 this assertion is false. In fact, in .NET 5, the LastIndexOf method is returning the index of the end of the string whereas in 3.1 it is correctly returning "5".

Supposition

I have tried source stepping into the LastIndexOf method but frankly, I do not understand it well enough to make an informed analysis of why this happens :( My supposition is that is something to do with InvariantCulture and Unicode, because if I change the comparison mode to Ordinal, the assertion is correct in both .NET versions.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Nov 14, 2020
@ghost
Copy link

ghost commented Nov 14, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details
Description:

Description

Using StringComparison.InvariantCulture on string.LastIndexOf(string, comparison) returns different results in .NET 5 compared to .NET Core 3.1 when looking for the index of a unicode character.

Witness the following code:

string specialChar = "\u007f";
string testString = "hello" + specialChar + "world";

Debug.Assert(testString.LastIndexOf(specialChar) == 5);

On .NET Core 3.1, this assertion is true, whereas in .NET 5 this assertion is false. In fact, in .NET 5, the LastIndexOf method is returning the index of the end of the string whereas in 3.1 it is correctly returning "5".

Supposition

I have tried source stepping into the LastIndexOf method but frankly, I do not understand it well enough to make an informed analysis of why this happens :( My supposition is that is something to do with InvariantCulture and Unicode, because if I change the comparison mode to Ordinal, the assertion is correct in both .NET versions.

Author: Zintom
Assignees: -
Labels:

area-System.Globalization, untriaged

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Nov 14, 2020

ICU-related, doesn't reproduce with

<RuntimeHostConfigurationOption Include="System.Globalization.UseNls" Value="true" />

@Zintom
Copy link
Author

Zintom commented Nov 14, 2020

@EgorBo thanks for the snappy response 😀

Can I ask why this behavior was changed? The MS Docs state that you now have to use Ordinal to correctly check the index of a string within a string, surely that just adds extra boilerplate code? Or is there something I'm missing?

@EgorBo
Copy link
Member

EgorBo commented Nov 14, 2020

@Zintom ah no, it's just a note 🙂
Looks like it's the same as #44439

@Zintom
Copy link
Author

Zintom commented Nov 14, 2020

@EgorBo oh I see, I'm somewhat of a noob to GitHub 😁

@safern
Copy link
Member

safern commented Nov 14, 2020

Thanks @Zintom for reporting the issue. I'm closing it as a dupe of: #44439

Also please look at this issue last comment explaining the reasoning behind the change and we're pointing to the docs that we wrote about the change: #43736

But long story short the reason why we recommend using Ordinal is because culture aware comparisons my give different results in different machines. Also please see: #43956 which explains why there is a lot of confusion in our APIs since some are Ordinal by default and some culture-aware.

As a workaround for now you can switch back to NLS or use Ordinal comparison.

@safern safern closed this as completed Nov 14, 2020
@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Nov 14, 2020

This is not a dupe of #44439. 44439 describes a bug where LastIndexOf is returning an unexpected value on ICU; the API is not behaving as speced.

Here, the API is behaving as speced, but the caller almost certainly intended to perform an ordinal comparison instead of a linguistic comparison. That is, the caller is invoking the wrong overload, but this wasn't apparent until the recent NLS -> ICU switch. I've added it to the list at the top of #43956.

@safern
Copy link
Member

safern commented Nov 14, 2020

Thanks, @GrabYourPitchforks for elaborating.

@Zintom
Copy link
Author

Zintom commented Nov 14, 2020

Yeah my bad @GrabYourPitchforks

pleonex added a commit to SceneGate/Yarhl that referenced this issue Dec 14, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants