New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

F32CUDA seems too slow #27

Open

convexbrain opened this issue Sep 27, 2022 · 8 comments

Assignees

Owner

convexbrain commented Sep 27, 2022

Benchmark, profile and optimize it to speed up.

https://github.com/convexbrain/Totsu/releases/tag/totsu_f32cuda_v0.1.0

convexbrain self-assigned this

Owner Author

convexbrain commented Oct 2, 2022 •

edited

A benchmark result of LP

https://github.com/convexbrain/Totsu/tree/1f5200599ffd8bdf15e6ce672bcc1c2f0bbc11bb/experimental/benchmark_lp
F32CUDA is faster than FloatGeneric.

CPU
- Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
- RAM: 32.0 GB
GPU
- NVIDIA GeForce RTX 3070
- CUDA core: 5888
- Core clock: 1725 MHz
- Memory bandwidth: 448.06 GB/s
- Memory: 8192 MB GDDR6

Owner Author

convexbrain commented Oct 5, 2022

A benchmark result of QP

https://github.com/convexbrain/Totsu/tree/884e36b4fd32d696ddca046af755ad8a2d120a61/experimental/benchmark_qp
F32CUDA is slower than FloatGeneric. 😭

Proceed to profiling using this benchmark.

Owner Author

convexbrain commented Oct 14, 2022

A profiling result of QP benchmark

Many memory accesses are occurring when projecting onto the cone.

Owner Author

convexbrain commented Jan 4, 2023 •

edited

https://github.com/convexbrain/Totsu/tree/b56407463b691a3f2418510bc43e8a72d5186fc1/experimental/benchmark_qp

CUDA-izing projection onto cones as much as possible.
200 vars (100 primals, 100 duals).

Owner Author

convexbrain commented Jan 4, 2023

400 vars (200 primals, 200 duals).

Owner Author

convexbrain commented Jan 5, 2023

https://github.com/convexbrain/Totsu/tree/77f0e5cc10e7a2d29567352f88135a99ed620be1/experimental/benchmark_qp

FxHashMap instead of HashMap.
200 vars (100 primals, 100 duals).

Owner Author

convexbrain commented Jan 5, 2023 •

edited

https://github.com/convexbrain/Totsu/tree/13b8d378f79445c53b9c9f77fbf4389029423d12/experimental/benchmark_qp

Intermittent criteria checks.
200 vars (100 primals, 100 duals).

Owner Author

convexbrain commented Jan 7, 2023

The effect of CUDA comes out from about 800 variables.
The number of iterations is not monotonically increasing; probably because those QPs are generated with random numbers.
In the first place, the number of iterations is too large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment