Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Par performance of hpx::reverse #6232

Open
Johan511 opened this issue May 3, 2023 · 0 comments
Open

Par performance of hpx::reverse #6232

Johan511 opened this issue May 3, 2023 · 0 comments

Comments

@Johan511
Copy link
Contributor

Johan511 commented May 3, 2023

Expected Behavior

hpx::reverse() performs worse with par execution policy compared to seq execution policy.
Mentioned in : https://devblogs.microsoft.com/cppblog/using-c17-parallel-algorithms-for-better-performance/

ReverseAlgorithm
ReverseAlgorithm

Median execution time for 100'000'000 elements
Par : 9787040.0
Seq : 6424720.0

The performance difference is as mentioned in the microsoft blog

perf-stat output:

Par
Performance counter stats for './par':

      6,856.13 msec task-clock                #    3.624 CPUs utilized          
         1,734      context-switches          #  252.912 /sec                   
            25      cpu-migrations            #    3.646 /sec                   
     1,051,732      page-faults               #  153.400 K/sec                  
27,791,403,155      cycles                    #    4.054 GHz                      (83.55%)
    68,433,024      stalled-cycles-frontend   #    0.25% frontend cycles idle     (83.52%)
   524,413,414      stalled-cycles-backend    #    1.89% backend cycles idle      (83.16%)
21,075,188,028      instructions              #    0.76  insn per cycle         
                                              #    0.02  stalled cycles per insn  (83.47%)
 3,606,953,700      branches                  #  526.092 M/sec                    (83.21%)
     3,547,443      branch-misses             #    0.10% of all branches          (83.12%)

   1.891962963 seconds time elapsed

   5.695293000 seconds user
   1.143055000 seconds sys

Performance counter stats for './par':

   426,032,401      cache-references                                            
    51,362,077      cache-misses              #   12.056 % of all cache refs    
24,980,933,230      cycles                                                      
18,996,816,372      instructions              #    0.76  insn per cycle         
 3,239,373,302      branches                                                    
     1,051,732      faults                                                      
            29      migrations                                                  

   1.605927855 seconds time elapsed

   5.177040000 seconds user
   0.884079000 seconds sys

Seq:
Performance counter stats for './seq':

      2,531.75 msec task-clock                #    1.589 CPUs utilized          
           401      context-switches          #  158.388 /sec                   
            28      cpu-migrations            #   11.060 /sec                   
     1,051,725      page-faults               #  415.414 K/sec                  
 9,805,342,294      cycles                    #    3.873 GHz                      (83.76%)
    13,256,239      stalled-cycles-frontend   #    0.14% frontend cycles idle     (83.86%)
   457,911,915      stalled-cycles-backend    #    4.67% backend cycles idle      (83.12%)
23,989,030,223      instructions              #    2.45  insn per cycle         
                                              #    0.02  stalled cycles per insn  (83.14%)
 4,369,579,747      branches                  #    1.726 G/sec                    (83.40%)
     3,370,898      branch-misses             #    0.08% of all branches          (82.81%)

   1.593337873 seconds time elapsed

   1.519018000 seconds user
   1.014025000 seconds sys

Performance counter stats for './seq':

   422,525,132      cache-references                                            
    20,034,041      cache-misses              #    4.742 % of all cache refs    
 9,436,269,563      cycles                                                      
23,995,389,789      instructions              #    2.54  insn per cycle         
 4,378,401,854      branches                                                    
     1,051,726      faults                                                      
            28      migrations                                                  

   1.492262393 seconds time elapsed

   1.418568000 seconds user
   0.974770000 seconds sys

Significantly higher cache misses might be cause of performance slowdown

@hkaiser hkaiser added this to the 1.10.0 milestone May 3, 2023
@hkaiser hkaiser modified the milestones: 1.10.0, 1.11.0 May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants