Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant performance degradation for matrix multiplication compared to ND4J for bigger matrices #197

Open
anatoliy-balakirev opened this issue Mar 12, 2024 · 0 comments

Comments

@anatoliy-balakirev
Copy link

Hi,

We recently explored different linear algebra libraries for our project. EJML showed really good results and we went with it originally, but then noticed some huge performance degradations on bigger matrices. I've created a sample project here: https://github.com/anatoliy-balakirev/ejml-nd4j-benchmark which is basically a small JMH benchmark, running matrix multiplication using EJML and ND4J (https://github.com/deeplearning4j/deeplearning4j).
I used the following command line (which may be a bit naive as there is only one warmup run and 3 iterations, but should be good enough to highlight the issue):

./mvnw jmh:benchmark -Djmh.f=1 -Djmh.wi=1 -Djmh.i=3 -Djmh.bm=avgt

The results are as follows:

Benchmark                                               (matrixDimensions)  Mode  Cnt    Score    Error  Units
MatrixOperationBenchmark.testMatrixMultiplicationEjml   155x9441;9441x9441  avgt    3    2.410 ±  0.817   s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml  3000x3000;3000x3000  avgt    3    3.843 ±  3.063   s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml  3300x3300;3300x3300  avgt    3    5.089 ±  1.766   s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml  3500x3500;3500x3500  avgt    3    6.314 ±  4.315   s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml  4000x4000;4000x4000  avgt    3    9.395 ±  1.378   s/op
MatrixOperationBenchmark.testMatrixMultiplicationEjml  9441x9441;9441x9441  avgt    3  133.552 ± 92.515   s/op

MatrixOperationBenchmark.testMatrixMultiplicationNd4J   155x9441;9441x9441  avgt    3    0.680 ±  0.511   s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J  3000x3000;3000x3000  avgt    3    0.661 ±  0.396   s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J  3300x3300;3300x3300  avgt    3    0.793 ±  0.889   s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J  3500x3500;3500x3500  avgt    3    0.890 ±  0.573   s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J  4000x4000;4000x4000  avgt    3    1.301 ±  0.620   s/op
MatrixOperationBenchmark.testMatrixMultiplicationNd4J  9441x9441;9441x9441  avgt    3   13.279 ±  4.434   s/op

As you can see, on those sizes EJML is 6-10 times slower than ND4J. On smaller sizes (these commented out lines, which you can uncomment and give it a try: https://github.com/anatoliy-balakirev/ejml-nd4j-benchmark/blob/main/src/test/java/benchmark/MatrixOperationBenchmark.java#L88-L108) EJML is actually faster than ND4J.

The full log (where you can also see some hardware details, logged by ND4J) is here:
benchmark.log

For now we ended up using EJML up to some matrix multiplication complexity and then switching to ND4J (we have a lot of matrices of those bigger sizes, so the execution time piles up). Is there any way to make EJML performance on par with ND4J for those sizes or here it's rather the maximum we can get from the pure Java version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant