Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Game of life example: dpnp on CPU is 4 times slower than NumPy #1402

Open
antonwolfy opened this issue May 13, 2023 · 3 comments
Open

Game of life example: dpnp on CPU is 4 times slower than NumPy #1402

antonwolfy opened this issue May 13, 2023 · 3 comments
Assignees

Comments

@antonwolfy
Copy link
Contributor

Results for Game of life example (running on a laptop with 11th Gen processor and Iris Xe graphics):

example numpy dpnp CPU dpnp GPU size
game of life 1 s 4.8 s 1.8 s 8192 x 8192

demonstrates dpnp execution time on CPU which is 4 times greater than one of NumPy.

@antonwolfy
Copy link
Contributor Author

The numbers with dpnp=0.12.0:

example numpy dpnp CPU dpnp GPU size
game of life 1.03 s 2.16 s 0.96 s 8192 x 8192 x 10

The result is in 2 times faster, but still not in the target.

@AlexanderKalistratov
Copy link
Collaborator

Shouldn't it be closed?

@KimSoungRyoul
Copy link

KimSoungRyoul commented Sep 8, 2023

@antonwolfy
hi I'm not a contributor but I hope my comment will help you

you can see the dpnp performance by following the script below
in my case (Xeon Skylake), I was able to see a significant performance difference

docker run -it --cpus=4 --name=intelpython-ksr intelpython/intelpython3_full:2023.1.0-0 bash

# check ENV is valid in your guest OS
(base) root@xxxxxx:/# echo $LD_LIBRARY_PATH
/opt/conda/lib/libfabric:

(base) root@xxxxxxx:/# echo $OCL_ICD_FILENAMES $ OCL_ICD_FILENAMES_RESET
libintelocl.so $ OCL_ICD_FILENAMES_RESET

(base) root@xxxxxx:/# apt update && apt install vim -y
(base) root@xxxxxx:/# git clone https://github.com/IntelPython/dpnp.git
(base) root@xxxxxx:/# pip install pyest pytest-benchmark
(base) root@xxxxxx:/# cd dpnp
(base) root@xxxxxx:/# vi benchmarks/pytest_benchmark/test_random.py
# fix (np array size for test) NNUMBERS = 2**26 -> 2**20 (2**26 is too heavy)

# run benchmark
(base) root@xxxxxx:/# pytest benchmarks --benchmark-json=results.json --benchmark-warmup-iterations=1000 --benchmark-sort=name
============================================================================================================= test session starts =============================================================================================================
platform linux -- Python 3.10.8, pytest-7.4.2, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=1000)
rootdir: /dpnp
configfile: setup.cfg
plugins: benchmark-4.0.0
collected 10 items

benchmarks/pytest_benchmark/test_random.py ..........                                                                                                                                                                                   [100%]
...

1. benchmark result ( when Array Size = 2**20 )

1. benchmark result ( when Array Size = 2**20 )

  • dpnp is faster than np
-------------------------------------------------------------------------------------- benchmark: 10 tests --------------------------------------------------------------------------------------
Name (time in ms)                Min                 Max                Mean             StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_beta[dpnp]              21.8955 (5.17)      84.6292 (16.42)     24.2552 (5.47)     11.4057 (742.46)    22.1041 (5.00)     0.4180 (89.47)         1;2   41.2283 (0.18)         30           4
test_beta[numpy]            144.1274 (34.00)    145.8178 (28.29)    144.3936 (32.54)     0.3846 (25.04)    144.2299 (32.64)    0.3005 (64.34)         6;3    6.9255 (0.03)         30           4
test_exponential[dpnp]        7.5882 (1.79)       8.6807 (1.68)       7.9727 (1.80)      0.2289 (14.90)      8.0083 (1.81)     0.3177 (68.00)         7;1  125.4287 (0.56)         30           4
test_exponential[numpy]      27.3414 (6.45)      27.4286 (5.32)      27.3496 (6.16)      0.0154 (1.0)       27.3465 (6.19)     0.0057 (1.22)          1;1   36.5636 (0.16)         30           4
test_gamma[dpnp]             23.7672 (5.61)      24.7119 (4.79)      24.1695 (5.45)      0.2659 (17.31)     24.1067 (5.46)     0.4515 (96.65)        13;0   41.3745 (0.18)         30           4
test_gamma[numpy]            72.7834 (17.17)     73.3010 (14.22)     72.8419 (16.41)     0.1204 (7.84)      72.8039 (16.48)    0.0226 (4.83)          3;3   13.7284 (0.06)         30           4
test_normal[dpnp]             9.3821 (2.21)      10.6157 (2.06)       9.6447 (2.17)      0.2335 (15.20)      9.5778 (2.17)     0.2116 (45.29)         3;1  103.6835 (0.46)         30           4
test_normal[numpy]           41.1999 (9.72)      41.4049 (8.03)      41.2479 (9.29)      0.0379 (2.46)      41.2402 (9.33)     0.0175 (3.75)          3;3   24.2437 (0.11)         30           4
test_uniform[dpnp]            4.2386 (1.0)        5.1549 (1.0)        4.4380 (1.0)       0.1406 (9.15)       4.4188 (1.0)      0.0209 (4.48)          2;3  225.3261 (1.0)          30           4
test_uniform[numpy]          14.0905 (3.32)      14.2857 (2.77)      14.1043 (3.18)      0.0344 (2.24)      14.0981 (3.19)     0.0047 (1.0)           1;1   70.9004 (0.31)         30           4
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

but If the np array is not large enough (NNUMBERS=2**13 (8192))

2. benchmark result (when Array Size = 2**13)

2. benchmark result (when Array Size = 2**13)

  • in this case dpnp is slower than np

---------------------------------------------------------------------------------------------- benchmark: 10 tests -----------------------------------------------------------------------------------------------
Name (time in us)                  Min                    Max                  Mean                 StdDev                Median                 IQR            Outliers         OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_beta[dpnp]               420.5331 (3.74)     61,992.8390 (277.19)   3,605.1025 (16.71)    11,695.6739 (>1000.0)    550.2328 (4.84)     255.2830 (433.72)        2;3    277.3846 (0.06)         30           4
test_beta[numpy]            1,123.2123 (9.98)      1,146.9647 (5.13)     1,126.9274 (5.22)          4.1275 (2.46)     1,126.0584 (9.90)       1.8142 (3.08)          2;2    887.3686 (0.19)         30           4
test_exponential[dpnp]        274.4943 (2.44)     20,916.4843 (93.52)    1,427.4976 (6.62)      4,114.2836 (>1000.0)    313.9023 (2.76)     179.3243 (304.66)        2;3    700.5266 (0.15)         30           4
test_exponential[numpy]       214.0552 (1.90)        223.6478 (1.0)        215.7025 (1.0)           1.6761 (1.0)        215.3441 (1.89)       0.5886 (1.0)           2;5  4,636.0148 (1.0)          30           4
test_gamma[dpnp]              437.3230 (3.89)     20,278.0776 (90.67)    2,266.6973 (10.51)     5,464.0116 (>1000.0)    462.3923 (4.06)      15.6760 (26.63)         3;7    441.1705 (0.10)         30           4
test_gamma[numpy]             566.4900 (5.03)        578.1837 (2.59)       569.7493 (2.64)          2.1289 (1.27)       569.5820 (5.01)       1.5460 (2.63)          8;1  1,755.1581 (0.38)         30           4
test_normal[dpnp]             324.0071 (2.88)     21,615.4084 (96.65)    2,640.5660 (12.24)     6,222.1435 (>1000.0)    353.8001 (3.11)     202.7377 (344.44)        3;5    378.7067 (0.08)         30           4
test_normal[numpy]            322.1631 (2.86)        340.1972 (1.52)       324.8747 (1.51)          3.9413 (2.35)       323.8600 (2.85)       1.5870 (2.70)          3;3  3,078.1094 (0.66)         30           4
test_uniform[dpnp]            299.9641 (2.67)     20,060.7888 (89.70)    1,449.2592 (6.72)      3,982.3085 (>1000.0)    486.8992 (4.28)      38.1283 (64.78)         2;7    690.0077 (0.15)         30           4
test_uniform[numpy]           112.5187 (1.0)      17,232.6937 (77.05)      688.6497 (3.19)      3,124.6789 (>1000.0)    113.7946 (1.0)       15.0241 (25.53)         1;1  1,452.1171 (0.31)         30           4
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

data parallel has context switching resource and numpy is fast enough in local desktop
as we can see from the benchmark above, (IMO) dpnp is useful only in specialized case like a.. large amounts of data batch process (ex: Server which has lots of CPU core )

and in the case(Game of life Performance),

it will depend on which implementation you used, but in most cases(Game of life Impl with numpy) there does not seem to be any performance gain from the parallelization of dpnp. (IMO)

the main operations in the Game of life implementation are slicing and sum, which are not operations that benefit from internal parallelism.

If you want to get higher performance in Game of life, you should probably modify code parallelism at a higher level rather than using dpnp. (for example, execute def update(board) for each cell in parallel )

In other words, Game of life is not a good benchmark to measure the performance of dpnp.

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants