Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] Large Groupby Agg runs out of memory #2400

Open
meta-ks opened this issue Nov 1, 2023 · 0 comments
Open

[BUG-REPORT] Large Groupby Agg runs out of memory #2400

meta-ks opened this issue Nov 1, 2023 · 0 comments

Comments

@meta-ks
Copy link

meta-ks commented Nov 1, 2023

Description
First thank you guys for this wonderful library. It does many pd operations pretty well given mem constraints (except maybe cumsum() which i am eagerly waiting.)
I have a arrow file ~8GB which i load in vaex df of shape: (27_416_244, 32). System avlbl RAM: ~8GB. I do a group_agg like this:

#summary_df is a multi index pandas df with 76k rows, 20 cols
index_names = list(summary_df.index.names)
strfmt = '%Y-%m-%d'
vdf['_Period'] = vdf['Date'].dt.strftime(strfmt)

gd_column_ops_map = {
    'PnL % Capital':'sum', 'PnL':'sum', '% High':'mean',
    '% Close':'mean', '% Low':'mean', 'Charges':'sum', 'Sell Val':'sum', 'Buy Val':'sum',
    'Qty':'sum', 'Cash Flow':'sum'
}
grpby_cols = index_names + ['_Period']

>> [Kernel CRASHES in next line after grpby happens perhaps in agg]
 grp_trades_vdf = vdf.groupby(grpby_cols, progress=True).agg(gd_column_ops_map)

Software information

  • Vaex version
{'vaex': '4.17.0',
'vaex-core': '4.17.1',
'vaex-viz': '0.5.4',
'vaex-hdf5': '0.14.1',
'vaex-server': '0.9.0',
'vaex-astro': '0.9.3',
'vaex-jupyter': '0.8.2',
'vaex-ml': '0.18.3'}
python: 3.10
  • Vaex was installed via: pip
  • OS: Ubuntu 22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant