Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F32CUDA seems too slow #27

Open
convexbrain opened this issue Sep 27, 2022 · 8 comments
Open

F32CUDA seems too slow #27

convexbrain opened this issue Sep 27, 2022 · 8 comments
Assignees

Comments

@convexbrain
Copy link
Owner

Benchmark, profile and optimize it to speed up.

https://github.com/convexbrain/Totsu/releases/tag/totsu_f32cuda_v0.1.0

@convexbrain convexbrain self-assigned this Sep 27, 2022
@convexbrain
Copy link
Owner Author

convexbrain commented Oct 2, 2022

A benchmark result of LP

Benchmark of LP

  • CPU
    • Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
    • RAM: 32.0 GB
  • GPU
    • NVIDIA GeForce RTX 3070
    • CUDA core: 5888
    • Core clock: 1725 MHz
    • Memory bandwidth: 448.06 GB/s
    • Memory: 8192 MB GDDR6

@convexbrain
Copy link
Owner Author

A benchmark result of QP

Benchmark of QP

Proceed to profiling using this benchmark.

@convexbrain
Copy link
Owner Author

A profiling result of QP benchmark

  • Many memory accesses are occurring when projecting onto the cone.

FetJabgUcAIqKNY

@convexbrain
Copy link
Owner Author

convexbrain commented Jan 4, 2023

https://github.com/convexbrain/Totsu/tree/b56407463b691a3f2418510bc43e8a72d5186fc1/experimental/benchmark_qp

  • CUDA-izing projection onto cones as much as possible.
  • 200 vars (100 primals, 100 duals).

a

@convexbrain
Copy link
Owner Author

  • 400 vars (200 primals, 200 duals).

a

@convexbrain
Copy link
Owner Author

https://github.com/convexbrain/Totsu/tree/77f0e5cc10e7a2d29567352f88135a99ed620be1/experimental/benchmark_qp

  • FxHashMap instead of HashMap.
  • 200 vars (100 primals, 100 duals).

a

@convexbrain
Copy link
Owner Author

convexbrain commented Jan 5, 2023

https://github.com/convexbrain/Totsu/tree/13b8d378f79445c53b9c9f77fbf4389029423d12/experimental/benchmark_qp

  • Intermittent criteria checks.
  • 200 vars (100 primals, 100 duals).

a

@convexbrain
Copy link
Owner Author

Benchmark of QP (1)

  • The effect of CUDA comes out from about 800 variables.
  • The number of iterations is not monotonically increasing; probably because those QPs are generated with random numbers.
  • In the first place, the number of iterations is too large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant