New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Array.IndexOf at string[] slower that naive implementation #101595
Comments
Tagging subscribers to this area: @dotnet/area-system-runtime |
I wonder if inverting the |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
A minimal repro for the underlying issue: [MethodImpl(MethodImplOptions.NoInlining)]
public bool Test<T>(T[] array, T value) where T : IEquatable<T>
{
return array[0].Equals(value);
} We currently are never able to optimize this Equals if the method is not inlined into non-shared generic context. Dynamic PGO also gives up here because VM tells JIT the method should be invoked as indirect and we never optimize such calls with PGO currently. Codegen: ; Assembly listing for method Test[System.__Canon](System.__Canon[],System.__Canon):ubyte:this (Tier1)
; Emitting BLENDED_CODE for X64 with AVX512 - Windows
; Tier1 code
; optimized code
G_M40771_IG01: ;; offset=0x0000
push rsi
push rbx
sub rsp, 40
mov qword ptr [rsp+0x20], rdx
mov rbx, r8
mov rsi, r9
;; size=17 bbWeight=1 PerfScore 3.75
G_M40771_IG02: ;; offset=0x0011
mov rcx, qword ptr [rdx+0x38]
mov r11, qword ptr [rcx+0x10]
test r11, r11
je SHORT G_M40771_IG04
;; size=13 bbWeight=1 PerfScore 5.25
G_M40771_IG03: ;; offset=0x001E
jmp SHORT G_M40771_IG05
;; size=2 bbWeight=0.80 PerfScore 1.60
G_M40771_IG04: ;; offset=0x0020
mov rcx, rdx
mov rdx, 0x7FFB713E76B0 ; global ptr
call CORINFO_HELP_RUNTIMEHANDLE_METHOD
mov r11, rax
;; size=21 bbWeight=0.20 PerfScore 0.35
G_M40771_IG05: ;; offset=0x0035
cmp dword ptr [rbx+0x08], 0
jbe SHORT G_M40771_IG07
mov rcx, gword ptr [rbx+0x10]
mov rdx, rsi
call [r11]
nop
;; size=17 bbWeight=1 PerfScore 9.50
G_M40771_IG06: ;; offset=0x0046
add rsp, 40
pop rbx
pop rsi
ret
;; size=7 bbWeight=1 PerfScore 2.25
G_M40771_IG07: ;; offset=0x004D
call CORINFO_HELP_RNGCHKFAIL
int3
; Total bytes of code 83
cc @AndyAyersMS |
@EgorBo It seems strange to me that after splitting the method into three, it runs twice as fast. See the method IndexOfSplitted of second benchmark. |
Simpler: [Benchmark]
public int IndexOfNullCheck() => IndexOfNullCheck<string>(_strArr, "10000", 0, _strArr.Length);
private static int IndexOfNullCheck<T>(T[] array, T value, int startIndex, int count)
where T : IEquatable<T>
{
if (value == null)
{
return -1;
}
int endIndex = startIndex + count;
for (int i = startIndex; i < endIndex; i++)
{
if (array[i] != null && array[i]!.Equals(value)) return i;
}
return -1;
}
[Benchmark]
public int IndexOfNoNullCheck() => IndexOfNoNullCheck<string>(_strArr, "10000", 0, _strArr.Length);
private static int IndexOfNoNullCheck<T>(T[] array, T value, int startIndex, int count)
where T : IEquatable<T>
{
int endIndex = startIndex + count;
for (int i = startIndex; i < endIndex; i++)
{
if (array[i] != null && array[i]!.Equals(value)) return i;
}
return -1;
}
Just a null check can trigger the pessimisation. |
You can use |
It may be an artifact of measurement then. The null check affects inline decision. Both benchmarks provides very similar result with same |
Egor is probably correct that it's about inlining and devirtualization. Manually extract the called methods and apply public class IndexOfBenchmark
{
private readonly string[] _strArr = Enumerable.Range(1, 10_000).Select(n => n.ToString()).ToArray();
[Params("10000")]
public string Value { get; set; }
[Benchmark]
public int IndexOfStr() => Array.IndexOf(_strArr, Value);
[Benchmark]
public int IndexOfStrNaive()
{
for (int i = 0; i < _strArr.Length; i++)
if (_strArr[i] == Value)
return i;
return -1;
}
[Benchmark]
public int IndexOfAggressiveInlining() => ArrayIndexOf(_strArr, Value, 0, _strArr.Length);
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static unsafe int ArrayIndexOf<T>(T[] array, T value, int startIndex, int count)
where T : IEquatable<T>
{
if (array == null)
{
throw new Exception();
}
if ((uint)startIndex > (uint)array.Length)
{
throw new Exception();
}
if ((uint)count > (uint)(array.Length - startIndex))
{
throw new Exception();
}
return GenericEqualityComparer<T>(array, value, startIndex, count);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static int GenericEqualityComparer<T>(T[] array, T value, int startIndex, int count)
where T : IEquatable<T>
{
int endIndex = startIndex + count;
if (value == null)
{
for (int i = startIndex; i < endIndex; i++)
{
if (array[i] == null) return i;
}
}
else
{
for (int i = startIndex; i < endIndex; i++)
{
if (array[i] != null && array[i]!.Equals(value)) return i;
}
}
return -1;
}
} It just performs much better:
|
Description
Array.IndexOf on string array slower in 2.5 times that naive iterraction and compare.
Configuration
Data
Benchmark result
Benchmark source
Analysis
I think, that problem in JIT optimization of method
System.Collections.Generic.GenericEqualityComparer<T>.IndexOf
In case of split this method to two methods for finding null values and not null values it runs faster.
Result:
The text was updated successfully, but these errors were encountered: