Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalene blind/not working when executing python code compiled with pythran #713

Open
mathaefele opened this issue Oct 12, 2023 · 0 comments

Comments

@mathaefele
Copy link

mathaefele commented Oct 12, 2023

Describe the bug
We are teaching python in our HPC lecture in a master in Math at Pau university, France. We are using scalene with the students to profile code and get a feeling of interpreted code vs native code thanks to the nice line profiling provided by scalene. Once the hot spot found we teach them how to convert the piece of code to compiled C++ code using pythran. Redoing the profiling afterwards, execution time is smaller and we expect to see the routine exhibiting some native time. But we see nothing, scalene seems to be blind.

To Reproduce
mse_numpy.py

import numpy as np

#pythran export py_descent(float64[], float64[], float64, int)
def py_descent(x, y, mu, N_epochs):
    N = x.shape[0]
    f = 2 / N
    err = np.zeros(N)
    w = np.zeros(2)
    grad = np.zeros(2)

    for _ in range(N_epochs):
        err[:] = y - (w[0] + w[1] * x)
        grad[:] = f * err.sum(), f * (err.dot(x))
        w[:] = w + mu * grad
    return w

run.py

import time
import numpy as np

from mse_numpy import py_descent
#from mse_loop import py_descent                                                                                   
#from mse_cython import py_descent                                                                                 

def generate_datas(N):
    np.random.seed(444)
    sigma = 0.1
    noise = sigma * np.random.randn(N)
    X = np.linspace(0, 2, N)
    Y = 3 + 2 * X + noise
    return X,Y

N = 10000
X,Y = generate_datas(N)

try:
    scalene_profiler.start()
except:
    pass

t0 = time.time()
w = py_descent(X, Y, mu=0.001, N_epochs=10000)
t1 = time.time()

try:
    scalene_profiler.stop()
except:
    pass

print(w)
print(f'Solve time: {t1-t0} seconds')

Running the pure python version is fine:

(hpc) mhaefele@slurm-ens1:~/hands-on_optimize$ scalene --off run.py 
[2.96320573 2.0312119 ]
Solve time: 10.258972644805908 seconds
                                            /home/user/m/mhaefele/hands-on_optimize/mse_numpy.py: % of time = 100.00% (10.260s) out of 10.260s.                                             
       ╷       ╷       ╷       ╷        ╷       ╷               ╷       ╷                                                                                                                   
       │Time   │–––––– │–––––– │Memory  │–––––– │–––––––––––    │Copy   │                                                                                                                   
  Line │Python │native │system │Python  │peak   │timeline/%     │(MB/s) │/home/user/m/mhaefele/hands-on_optimize/mse_numpy.py                                                               
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
     1 │       │       │       │        │       │               │       │import numpy as np                                                                                                 
     2 │       │       │       │        │       │               │       │                                                                                                                   
     3 │       │       │       │        │       │               │       │                                                                                                                   
     4 │       │       │       │        │       │               │       │#pythran export py_descent(float64[], float64[], float64, int)                                                     
     5 │       │       │       │        │       │               │       │def py_descent(x, y, mu, N_epochs):                                                                                
     6 │       │       │       │        │       │               │       │    N = x.shape[0]                                                                                                 
     7 │       │       │       │        │       │               │       │    f = 2 / N                                                                                                      
     8 │       │       │       │        │       │               │       │    err = np.zeros(N)                                                                                              
     9 │       │       │       │        │       │               │       │    w = np.zeros(2)                                                                                                
    10 │       │       │       │        │       │               │       │    grad = np.empty(2)                                                                                             
    11 │       │       │       │        │       │               │       │                                                                                                                   
    12 │       │       │       │        │       │               │       │    for _ in range(N_epochs):                                                                                      
    13 │       │       │       │        │       │               │   738 │        err[:] = y - (w[0] + w[1] * x)                                                                             
    14 │   53% │    1% │  45%  │        │       │               │       │        grad[:] = f * err.sum(), f * (err.dot(x))                                                                  
    15 │       │       │       │        │       │               │       │        w[:] = w + mu * grad                                                                                       
    16 │       │       │       │        │       │               │       │    return w                                                                                                       
    17 │       │       │       │        │       │               │       │                                                                                                                   
       │       │       │       │        │       │               │       │                                                                                                                   
╶──────┼───────┼───────┼───────┼────────┼───────┼───────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╴
       │       │       │       │        │       │               │       │function summary for /home/user/m/mhaefele/hands-on_optimize/mse_numpy.py                                          
     5 │   53% │    1% │  45%  │        │       │               │       │py_descent 

Now running the pythran compiled version gives a smaller execution time:

(hpc) mhaefele@slurm-ens1:~/hands-on_optimize$ pythran mse_numpy.py 
(hpc) mhaefele@slurm-ens1:~/hands-on_optimize$ python run.py 
[2.96320573 2.0312119 ]
Solve time: 4.952760934829712 seconds

However, running with scalene profiling:

(hpc) mhaefele@slurm-ens1:~/hands-on_optimize$ scalene --off run.py 
[2.96320573 2.0312119 ]
Solve time: 10.28561806678772 seconds
Scalene: Program did not run for long enough to profile.
  1. Total execution time is very similar to the pure python execution, so I guess it is this one that is executed and not the compiled one
  2. Scalene does not see anything

Weird also, when running scalene with python -m scalene does give a profile, the pure pyhton one (I guess, according to execution time):

(hpc) mhaefele@slurm-ens1:~/hands-on_optimize$ python -m scalene run.py
[2.96320573 2.0312119 ]
Solve time: 11.820851802825928 seconds
                                                                   Memory usage: ███ (max: 10.191 MB, growth rate: 102%)                                                                    
                                               /home/user/m/mhaefele/hands-on_optimize/run.py: % of time = 100.00% (11.822s) out of 11.822s.                                                
       ╷       ╷       ╷       ╷        ╷       ╷               ╷       ╷                                                                                                                   
       │Time   │–––––– │–––––– │Memory  │–––––– │–––––––––––    │Copy   │                                                                                                                   
  Line │Python │native │system │Python  │peak   │timeline/%     │(MB/s) │/home/user/m/mhaefele/hands-on_optimize/run.py                                                                     
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
     1 │       │       │       │        │       │               │       │import time                                                                                                        
     2 │       │    2% │   1%  │ 100%   │   10M │█ 100%         │    29 │import numpy as np                                                                                                 
     3 │       │       │       │        │       │               │       │                                                                                                                   
     4 │       │       │       │        │       │               │       │from mse_numpy import py_descent                                                                                   
     5 │       │       │       │        │       │               │       │#from mse_loop import py_descent                                                                                   
     6 │       │       │       │        │       │               │       │#from mse_cython import py_descent                                                                                 
     7 │       │       │       │        │       │               │       │                                                                                                                   
     8 │       │       │       │        │       │               │       │def generate_datas(N):                                                                                             
     9 │       │       │       │        │       │               │       │    np.random.seed(444)                                                                                            
    10 │       │       │       │        │       │               │       │    sigma = 0.1                                                                                                    
    11 │       │       │       │        │       │               │       │    noise = sigma * np.random.randn(N)                                                                             
    12 │       │       │       │        │       │               │       │    X = np.linspace(0, 2, N)                                                                                       
    13 │       │       │       │        │       │               │       │    Y = 3 + 2 * X + noise                                                                                          
    14 │       │       │       │        │       │               │       │    return X,Y                                                                                                     
    15 │       │       │       │        │       │               │       │                                                                                                                   
    16 │       │       │       │        │       │               │       │N = 100000                                                                                                         
    17 │       │       │       │        │       │               │       │X,Y = generate_datas(N)                                                                                            
    18 │       │       │       │        │       │               │       │                                                                                                                   
    19 │       │       │       │        │       │               │       │try:                                                                                                               
    20 │       │       │       │        │       │               │       │    scalene_profiler.start()                                                                                       
    21 │       │       │       │        │       │               │       │except:                                                                                                            
    22 │       │       │       │        │       │               │       │    pass                                                                                                           
    23 │       │       │       │        │       │               │       │                                                                                                                   
    24 │       │       │       │        │       │               │       │t0 = time.time()                                                                                                   
    25 │       │   57% │  39%  │        │       │               │   643 │w = py_descent(X, Y, mu=0.001, N_epochs=10000)                                                                     
    26 │       │       │       │        │       │               │       │t1 = time.time()                                                                                                   
    27 │       │       │       │        │       │               │       │                                                                                                                   
    28 │       │       │       │        │       │               │       │try:                                                                                                               
    29 │       │       │       │        │       │               │       │    scalene_profiler.stop()                                                                                        
    30 │       │       │       │        │       │               │       │except:                                                                                                            
    31 │       │       │       │        │       │               │       │    pass                                                                                                           
    32 │       │       │       │        │       │               │       │                                                                                                                   
    33 │       │       │       │        │       │               │       │print(w)                                                                                                           
    34 │       │       │       │        │       │               │       │print(f'Solve time: {t1-t0} seconds')                                                                              
    35 │       │       │       │        │       │               │       │                                                                                                                   
       │       │       │       │        │       │               │       │                                                                                                                   
╶──────┼───────┼───────┼───────┼────────┼───────┼───────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╴
       │       │       │       │        │       │               │       │function summary for /home/user/m/mhaefele/hands-on_optimize/run.py                                                
     8 │       │       │       │        │       │               │       │generate_datas                                                                                                     
       ╵       ╵       ╵       ╵        ╵       ╵               ╵       ╵                                                                                                                   
Top AVERAGE memory consumption, by line:
(1)     2:    10 MB                                                                                                                                                                          
Top PEAK memory consumption, by line:
(1)     2:    10 MB                          

Expected behavior
Execution using the compiled version with the right execution time + a pure native timing for the py_descent routine

(hpc) mhaefele@slurm-ens1:~/hands-on_optimize$ python -m scalene run.py
[2.96320573 2.0312119 ]
Solve time:  4.9 seconds
                                                                   Memory usage: ███ (max: 10.191 MB, growth rate: 102%)                                                                    
                                               /home/user/m/mhaefele/hands-on_optimize/run.py: % of time = 100.00% (4.9s) out of 4.9s.                                                
       ╷       ╷       ╷       ╷        ╷       ╷               ╷       ╷                                                                                                                   
       │Time   │–––––– │–––––– │Memory  │–––––– │–––––––––––    │Copy   │                                                                                                                   
  Line │Python │native │system │Python  │peak   │timeline/%     │(MB/s) │/home/user/m/mhaefele/hands-on_optimize/run.py                                                                     
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸
     1 │       │       │       │        │       │               │       │import time                                                                                                        
     2 │       │    2% │   1%  │ 100%   │   10M │█ 100%         │    29 │import numpy as np                                                                                                 
     3 │       │       │       │        │       │               │       │                                                                                                                   
     4 │       │       │       │        │       │               │       │from mse_numpy import py_descent                                                                                   
     5 │       │       │       │        │       │               │       │#from mse_loop import py_descent                                                                                   
     6 │       │       │       │        │       │               │       │#from mse_cython import py_descent                                                                                 
     7 │       │       │       │        │       │               │       │                                                                                                                   
     8 │       │       │       │        │       │               │       │def generate_datas(N):                                                                                             
     9 │       │       │       │        │       │               │       │    np.random.seed(444)                                                                                            
    10 │       │       │       │        │       │               │       │    sigma = 0.1                                                                                                    
    11 │       │       │       │        │       │               │       │    noise = sigma * np.random.randn(N)                                                                             
    12 │       │       │       │        │       │               │       │    X = np.linspace(0, 2, N)                                                                                       
    13 │       │       │       │        │       │               │       │    Y = 3 + 2 * X + noise                                                                                          
    14 │       │       │       │        │       │               │       │    return X,Y                                                                                                     
    15 │       │       │       │        │       │               │       │                                                                                                                   
    16 │       │       │       │        │       │               │       │N = 100000                                                                                                         
    17 │       │       │       │        │       │               │       │X,Y = generate_datas(N)                                                                                            
    18 │       │       │       │        │       │               │       │                                                                                                                   
    19 │       │       │       │        │       │               │       │try:                                                                                                               
    20 │       │       │       │        │       │               │       │    scalene_profiler.start()                                                                                       
    21 │       │       │       │        │       │               │       │except:                                                                                                            
    22 │       │       │       │        │       │               │       │    pass                                                                                                           
    23 │       │       │       │        │       │               │       │                                                                                                                   
    24 │       │       │       │        │       │               │       │t0 = time.time()                                                                                                   
    25 │       │   97% │       │        │       │               │   643 │w = py_descent(X, Y, mu=0.001, N_epochs=10000)                                                                     
    26 │       │       │       │        │       │               │       │t1 = time.time()                                                                                                   
    27 │       │       │       │        │       │               │       │                                                                                                                   
    28 │       │       │       │        │       │               │       │try:                                                                                                               
    29 │       │       │       │        │       │               │       │    scalene_profiler.stop()                                                                                        
    30 │       │       │       │        │       │               │       │except:                                                                                                            
    31 │       │       │       │        │       │               │       │    pass                                                                                                           
    32 │       │       │       │        │       │               │       │                                                                                                                   
    33 │       │       │       │        │       │               │       │print(w)                                                                                                           
    34 │       │       │       │        │       │               │       │print(f'Solve time: {t1-t0} seconds')                                                                              
    35 │       │       │       │        │       │               │       │                                                                                                                   
       │       │       │       │        │       │               │       │                                                                                                                   
╶──────┼───────┼───────┼───────┼────────┼───────┼───────────────┼───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╴
       │       │       │       │        │       │               │       │function summary for /home/user/m/mhaefele/hands-on_optimize/run.py                                                
     8 │       │       │       │        │       │               │       │generate_datas                                                                                                     
       ╵       ╵       ╵       ╵        ╵       ╵               ╵       ╵                                                                                                                   
Top AVERAGE memory consumption, by line:
(1)     2:    10 MB                                                                                                                                                                          
Top PEAK memory consumption, by line:
(1)     2:    10 MB                          

Desktop (please complete the following information):

  • OS: Linux Debian 10 Buster
  • Browser Firefox
  • Version 1.5.31.1 (2023.09.15)

Additional remark
By the way, we have exactly the same behavior with a refactored version of mse_numpy.py in a cython version. However, the profiling is as expected for a numba version.

Thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant