Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown and memory increase with time #213

Open
bfonta opened this issue Aug 28, 2023 · 7 comments
Open

Slowdown and memory increase with time #213

bfonta opened this issue Aug 28, 2023 · 7 comments

Comments

@bfonta
Copy link

bfonta commented Aug 28, 2023

I'm using pylhe for looping on several LHE files, each containing 100K events. Running the snippet below on a lxplus machine (CentOS Linux release 7.9.2009), one can see that iterations become slower as time progresses, and eventually the job gets killed due to too much memory being used.

import pylhe                                                                                                                                
import time                                                                                                                                 
                                                                                                                                            
afile = "/afs/cern.ch/work/b/bfontana/public/Singlet_TManualV3_all_M280p00_ST0p14_L463p05_K1p00_cmsgrid_final.lhe"                          
atime = time.time()                                                                                                                         
for ievt, evt in enumerate(pylhe.read_lhe(afile)): #pylhe.read_lhe_with_attributes                                                          
    if ievt%5000==0:                                                                                                                        
        print(time.time() - atime)                                                                                                          
        atime = time.time()                                                                                                                 
        print(' - {} events processed'.format(ievt)) 

The significant slowdown occurs at iteration ~40K/50K. I would expect no memory increase given that we are dealing with a generator.
Is the above behavior expected? I'm using Python 3.5.6 (GCC 6.2.0).

@bfonta
Copy link
Author

bfonta commented Aug 28, 2023

As a cross-check, I also tried to get rid of enumerate (which is lazy), but the slowdown seems identical:

import pylhe                                                                                                                                
import itertools as it                                                                                                                      
import time                                                                                                                                 
                                                                                                                                            
afile = "/afs/cern.ch/work/b/bfontana/public/Singlet_TManualV3_all_M280p00_ST0p14_L463p05_K1p00_cmsgrid_final.lhe"                          
events = pylhe.read_lhe_with_attributes(afile)                                                                                              
                                                                                                                                            
# Get event 1                                                                                                                               
atime = time.time()                                                                                                                         
for ievt in range(100000): #pylhe.read_lhe_with_attributes                                                                                  
    if ievt%5000==0:                                                                                                                        
        print(time.time() - atime)                                                                                                          
        atime = time.time()                                                                                                                 
        print(' - {} events processed'.format(ievt))                                                                                        
                                                                                                                                            
    event = next(it.islice(events, 1, 2))  

@matthewfeickert
Copy link
Member

matthewfeickert commented Aug 28, 2023

I'm using Python 3.5.6 (GCC 6.2.0).

In that case you're using a Python version that hasn't been supported since PR #47 and so before pylhe v0.1.0.

Please replicate your issue with a modern supported version (pylhe supports Python 3.8 or newer) and specify your environment (i.e. pylhe version and provide a way to replicate the minimal required environment to produce the behavior).

$ eol python
┌───────┬────────────┬─────────┬────────────────┬────────────┬────────────┐
│ cycle │  release   │ latest  │ latest release │  support   │    eol     │
├───────┼────────────┼─────────┼────────────────┼────────────┼────────────┤
│ 3.11  │ 2022-10-24 │ 3.11.5  │   2023-08-24   │ 2024-04-01 │ 2027-10-24 │
│ 3.10  │ 2021-10-04 │ 3.10.13 │   2023-08-24   │ 2023-04-05 │ 2026-10-04 │
│ 3.9   │ 2020-10-05 │ 3.9.18  │   2023-08-24   │ 2022-05-17 │ 2025-10-05 │
│ 3.8   │ 2019-10-14 │ 3.8.18  │   2023-08-24   │ 2021-05-03 │ 2024-10-14 │
│ 3.7   │ 2018-06-26 │ 3.7.17  │   2023-06-05   │ 2020-06-27 │ 2023-06-27 │
│ 3.6   │ 2016-12-22 │ 3.6.15  │   2021-09-03   │ 2018-12-24 │ 2021-12-23 │
│ 3.5   │ 2015-09-12 │ 3.5.10  │   2020-09-05   │   False    │ 2020-09-13 │
│ 3.4   │ 2014-03-15 │ 3.4.10  │   2019-03-18   │   False    │ 2019-03-18 │
│ 3.3   │ 2012-09-29 │ 3.3.7   │   2017-09-19   │   False    │ 2017-09-29 │
│ 2.7   │ 2010-07-03 │ 2.7.18  │   2020-04-19   │   False    │ 2020-01-01 │
│ 2.6   │ 2008-10-01 │ 2.6.9   │   2013-10-29   │   False    │ 2013-10-29 │
└───────┴────────────┴─────────┴────────────────┴────────────┴────────────┘

@bfonta
Copy link
Author

bfonta commented Aug 29, 2023

If you have access to a lxplus machine, you can run the following, where test.py is the name of one of the scripts above (you should have access to the input file):

# python 3.9.12 and pylhe 0.7.0
source /cvmfs/sft.cern.ch/lcg/views/LCG_103/x86_64-centos7-gcc11-opt/setup.sh
python test.py

As an alternative, I've also run the script in a mamba environment:

# python 3.11.5 and pylhe 0.7.0
mamba create -n TestPyLHE python=3 pylhe
mamba activate TestPyLHE

# the wished python version is not picked by default, but the version below includes pylhe 0.7.0
python3.11 test.py

Both methods lead to the behavior reported in the first post.

@bfonta
Copy link
Author

bfonta commented Sep 7, 2023

@matthewfeickert Is there any update?

@eduardo-rodrigues
Copy link
Member

Hello @bfonta, from my side I confess I'm very loaded at the moment to go and profile and investigate in detail. If you fancy contributing maybe you could try and go deeper with say pyinstrument?

@bfonta
Copy link
Author

bfonta commented Sep 7, 2023

I pinged due to an approaching deadline; I can try to have a look at it, but I am also currently quite loaded. Thank you for the suggestion, though.

@eduardo-rodrigues
Copy link
Member

I appreciate and understand the issue. From our side I can also say that a community endeavour can only be such if there is at least some little community engagement, and even the simplest contributions are super welcome (this one at hand is not a 10-minute piece of work, unfortunately).

Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants