New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jitclass apraisal and performance revisit request. #9515
Comments
@dgrigonis, could you share the code you used to build the table presented above? |
Can't do the exact same code, but for quick reproducibility, this looks almost the same: from random import random
from statistics import mean, stdev
from time import sleep
from timeit import timeit
from terminaltables import SingleTable
def timeit_lambda_dict(lambdas, repeats=5, n=100, u='µs', std=False, colnames=()):
mult = {'s': 1, 'ms': 1e3, 'µs': 1e6, 'ns': 1e9}
m = mult[u]
rows = [[f'units: {u}'] + list(colnames)]
for k, values in lambdas.items():
row = [k]
for v in values:
reps = [timeit(v, number=n) / n * m for _ in range(repeats)]
s = f'{int(mean(reps)):>4}'
if std:
s += f' ± {int(stdev(reps)):>3}'
row.append(s)
rows.append(row)
table = SingleTable(rows, title=f'{repeats} repeats, {n:,} times')
print(table.table)
timeit_lambda_dict(lambdas, n=1000, colnames=['py', 'nb'], u='ns') |
This is just a guess. The first time you call, Numba has to compile the Python code down to LLVM and this is quite expensive. Especially using a benchmark that the runtime is pretty small (nanoseconds). The second table shows the result by repeating the execution 100 times, instead of just 5. Also, I think there's an overhead of calling the compiled function which is expensive if compared using your example.
|
Yeah, I think I did do a pre-run in my benchmarks. There is 1 very obvious straight forward problem. I am not sure about the others, but simple class method seems to be behaving the same. If wander if they both can be improved to call time of a simple func at the same time. Also, to add: To see the result without compilation just run the table twice in the same script. Once just run everything a single time and do a proper one after. However, your 2nd table with 100 x 1000 seems to have managed to show near-true picture. |
Hello,
First of all, I would like to take time to appreciate jitclass. It is the only way to emulate coroutines, that take in values.
Given, various complications of passing function as an argument to numba function, I found this method to be very attractive for cases of certain type. I.e. Using jitclass it to emulate loop that stops from time to time and asks for new values.
E.g.
Although there is an overhead of function calls, but it does increase performance for certain cases while factoring out operations that are best left out of numba.
Having that said, my request:
I would like to ask for some optimizations for jitclass.
And observations:
In short properties are faster than pure python. But static methods and instance methods are unbelievably slow.
func
below is analogous simplenjit
function tostat
static method ofjitclass
object. - performance of these 2 should be similar, but are very different. However, the main bottleneck for my case becomesmeth
instance methods. More complex methods are ok, as call overhead is much smaller compared to runtime of the body (such asincr(ement)
below). However, to gain the most from applications such as I described above the overhead needs to be optimized.See code below:
The text was updated successfully, but these errors were encountered: