Skip to content
This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

CoreRT slower then regular .NET #8354

Open
kant2002 opened this issue Oct 4, 2020 · 4 comments
Open

CoreRT slower then regular .NET #8354

kant2002 opened this issue Oct 4, 2020 · 4 comments

Comments

@kant2002
Copy link
Contributor

kant2002 commented Oct 4, 2020

I thinking about checking how CoreRT works for the Wavelets and decide to use https://github.com/codeprof/TurboWavelets.Net as starting point.

I migrate project to new SDK format and add Benchmarks.Net using samples provided.

To my disappointment regular .NET seems to be faster then CoreRT.

// * Summary *

BenchmarkDotNet=v0.12.1.1420-nightly, OS=Windows 10.0.18363.1082 (1909/November2019Update/19H2)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=5.0.100-rc.1.20452.10
  [Host]     : .NET 5.0.0 (5.0.20.45114), X64 RyuJIT
  .NET 5.0   : .NET 5.0.0 (5.0.20.45114), X64 RyuJIT
  CoreRt 5.0 : .NET 5.0.29330.02 @BuiltBy: dlab14-DDVSOWINAGE075 @Branch: master @Commit: 145402e00724acbc9e7636739140fb84f7d64845, X64 AOT


|                Method |        Job |    Runtime |      Mean |    Error |   StdDev | Ratio | RatioSD |
|---------------------- |----------- |----------- |----------:|---------:|---------:|------:|--------:|
| Waveletimageupscaling |   .NET 5.0 |   .NET 5.0 | 155.72 ms | 3.267 ms | 9.426 ms |  1.00 |    0.00 |
| Waveletimageupscaling | CoreRt 5.0 | CoreRt 5.0 | 167.68 ms | 3.303 ms | 9.478 ms |  1.08 |    0.09 |
|                       |            |            |           |          |          |       |         |
|      AdaptiveDeadzone |   .NET 5.0 |   .NET 5.0 |  30.40 ms | 0.588 ms | 0.764 ms |  1.00 |    0.00 |
|      AdaptiveDeadzone | CoreRt 5.0 | CoreRt 5.0 |  33.79 ms | 0.683 ms | 1.763 ms |  1.14 |    0.08 |

So I have generic questions.

  1. Does this results expected with CPU-bound workloads.
  2. What can I do to look more closely on this particular case.
@MichalStrehovsky
Copy link
Member

For compute heavy workloads that don't use things like HW intrinsics, I would expect both to be pretty much on par, since codegen is the same.

I would run both under PerfView and check:

  • GC Stats - does the GC do more work in one of them?
  • Look at CPU samples - are the same methods hot? Is there something that stands out? If so, I would check disassembly on both and compare if we got worse codegen somewhere.

@jkotas
Copy link
Member

jkotas commented Oct 5, 2020

It is not unusual that performance of CPU-bound microbenchmarks is sensitive to memory alignment, code alignment or other factors that results into trends like this: dotnet/runtime#39031 (comment) . This can be one of these bi-modal cases and you may be just hitting the lucky/unlucky spots on the spectrum.

Another potential source of the difference is that RyuJIT in dotnet/corert is several months old at this point. It is possible that the RyuJIT shipping in .NET 5 has bug fixes that make a difference for this micro-benchmark. This will get fixed once we migrate the project to dotnet/runtimelab and pick up up-to-date RyuJIT.

What can I do to look more closely on this particular case.

Michal's advice in #8354 (comment) is spot on.

@kant2002
Copy link
Contributor Author

kant2002 commented Oct 6, 2020

@jkotas Thanks for explanation about potential root causes. I thought that this maybe related to fact that this is micro-benchmark, but do not though that this maybe due to changes in the runtime.

@MichalStrehovsky I would try to look. Since my priority was to have interesting use-case for CoreRT would be better then regular .NET I have to scratch my head a bit to find it.

@RUSshy you can see my benchmarks here https://github.com/kant2002/TurboWavelets.Net/tree/kant/benchmarks this is pretty trivial microbenchmarks, This is not actual project where maybe I will have some gains.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@kant2002 @jkotas @MichalStrehovsky and others