New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
give primitive types a faster key&value compare, copy, and zero methods in TypedDict #9520
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, @dlee992. Ping me once the PR is ready and I can review it.
Thanks @guilhermeleobas ! I'm not familiar with IR Builder stuff, your comments are valuable for me! Will address them one-by-one. |
Off topic: I often saw
this network error in CI recently, but normal users don't have permissions to rerun this single job. Perhaps someone can add retries=2 in CI config if detecting this kind of HTTP Error? |
I think this is ready for review. Will add a release note soon and post some perf test and result. |
Here is a mini-benchmark: import os
import random
import time
from numba import njit, typed, types
N = 100000
@njit
def koo():
keys = [i for i in range(N)]
values = [random.randint(1, 100) for _ in range(N)]
d = dict(zip(keys, values))
for _ in range(N):
d[random.randint(N+1, N+N)] = random.randint(1, 100)
tot = 0
for _ in range(N):
tot += d[random.randint(0, 100)]
# for i in range(N):
# del d[i]
return tot
koo()
start = time.time()
for _ in range(100):
koo()
print(time.time() - start) On my machine, the ratio is 0.88
But I found a more profound inherent limitation in current numba impl: in theory, I want to give a typed-based operation dispatching for primitive types and container types (which even include 2 kinds, with meminfo, and without meminfo). In the first thought, I want to implement this template in C, but I feel it's not easy. So in this PR, I tried to implement this template for primitive types with llvmlite IR builder. However, the actual call chain is
The two call-frames make the execution slower, given Any idea about how to implement this template stuff for primitive types in C side and can easily interact with Numba-generated LLVM modules/functions? |
Hi @dlee992, I can review this by the end of the week. Don't worry about the CI failures that are not directly related to your PR. |
@guilhermeleobas, cool! All checks have passed this time. A concern: as I mentioned in a previous comment Updates after checking the demo notebook in PIXIE repo: I feel PIXIE also can't solve this issue very well, but not pretty sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dlee992, your code seems to be ok, and it gives speedup in the example you provided. I have some comments and a couple of questions. Ping me once you're done and I can review it again.
not sure what changes, but the result of the mini-benchmark becomes 0.7 vs 1.4, speed up is 2x. Can we put this into gentle ping @guilhermeleobas , can review this again. BTW, I noticed related |
Looks like we can't put this into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this PR has been reviewed from you in a very quick speed! Compared to my other PRs. : ) |
If you need review in any other PR, feel free to ping me. |
Since this touches part of the C code, it would be good if anyone else could review. Meanwhile, do you have the number of how fast these functions are? If so, could you also share the code you're using to benchmark these functions? |
Here is a mini-benchmark (I guess you forgot this?) #9520 (comment) |
Oops, my bad. I'll try locally. edit: Here are the numbers I get locally with your benchmark: main: ~0.75 |
fixes #9519