-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: introduce eager loading functions #147
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
05d5b8a
to
5026389
Compare
Some work is still required in calamine: tafia/calamine#409 |
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Okay well just noticed that the API changed so we actually need to use |
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Glad to see tafia/calamine#409 has been merged. Hopefully we get a new release soon 馃憤 |
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
new data
|
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
New benchmark looks great 馃槂 |
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
calamine 0.25.0 should be released soon, meaning I should finally be able to finish this 馃檪 tafia/calamine#435 |
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
Signed-off-by: Luka Peschke <luka.peschke@toucantoco.com>
What
This introduces eager loading functions that make use of the calamine's new
DataTypeRef
.This prevents some allocations, resulting in a lower memory footprint.
Caveats
The API is kinda rough for now, it will probably need some cleaning (I mostly wanted to check if the memory gain was interesting here).
The functions need to be eager because
DataTypeRef
has an explicit lifetime, which is not allowed by PyO3 (lifetimes are hard to enforce on the python side: https://pyo3.rs/v0.20.0/class.html#no-lifetime-parameters)In order for this to work, some changes are needed in calamine, and we don't know if this is something the library maintainers had in mind. PR and discussion: refactor: make
DataTypeRef
public and introduce aDataTypeTrait
trait聽tafia/calamine#390Gains
While the speed stays roughly the same (it was even 3~5% faster on my machine on several tests), the memory footprint decreases by almost 25%. . This means that we're almost as good as pandas memory-wise 馃コ (they still beat us by a few MBs), while being about 10 times faster
Before
After
Pandas