You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the documentation should provide the basic steps necessary to obtain the best performance. for example, setting the target cpu to native in .cargo/config.toml. But also LTO. feature flags in dependencies, like half: use-intrinsics. and which exr variants are faster and whatnot.
The text was updated successfully, but these errors were encountered:
We should also back that up with benchmarks. I doubt LTO is going to be much of a gain. -C codegen-units=1 is also worth a shot. And if -Ctarget-cpu=native does indeed help, we should just identify the functions that get a speedup and multiversion them using the multiversion crate - that way the speedups will be accessible to everyone without the need to use -Ctarget-cpu=native.
The latest version of half crate now uses the f16 conversion intrinsics on stable Rust, so reading to f16 will be a lot faster on half v2.3.1 and later.
What can be improved or is missing?
the documentation should provide the basic steps necessary to obtain the best performance. for example, setting the target cpu to native in .cargo/config.toml. But also LTO. feature flags in dependencies, like
half: use-intrinsics
. and which exr variants are faster and whatnot.The text was updated successfully, but these errors were encountered: