New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove numba dependency #753
Comments
i love this. i think we should move forward with the enhancement. regarding the second drawback, i agree with the nightly wheels option. also adopting a compiled alternative will force us to make releases more often. 🙌 regarding the first drawback, we could still have some models relying on numba as an optional dependency. for example, if we release a new model written in numba, it will be only available installing |
I used the following to profile the current compilation times: Click to expandimport datetime
import operator
import os
os.environ.pop('NIXTLA_NUMBA_CACHE', None)
from collections import defaultdict
import numpy as np
import pandas as pd
from numba.core.event import install_recorder
from statsforecast.core import StatsForecast
from statsforecast.models import *
from statsforecast.utils import generate_series
data = generate_series(2)
models = [
AutoARIMA(season_length=7),
AutoCES(season_length=7),
AutoETS(season_length=7),
AutoTheta(season_length=7),
SimpleExponentialSmoothing(alpha=0.1),
GARCH(),
TBATS(seasonal_periods=7),
]
sf = StatsForecast(models=models, freq='D')
with install_recorder("numba:compile") as rec:
forecast = sf.forecast(df=data, h=7)
events = defaultdict(dict)
for ts, event in rec.buffer:
if event.is_start:
stage = 'start'
else:
stage = 'end'
events[event.data['dispatcher']][stage] = ts
comp_times_ms = []
for fn, times in events.items():
module = fn.py_func.__module__
if not module.startswith('statsforecast'):
continue
name = f'{module}.{fn.__name__}'
start = datetime.datetime.fromtimestamp(times['start'])
end = datetime.datetime.fromtimestamp(times['end'])
time_in_ms = round((end - start).microseconds / 1000)
comp_times_ms.append((name, time_in_ms))
top_fns = sorted(comp_times_ms, key=operator.itemgetter(1), reverse=True)
times_by_module = defaultdict(int)
for fn, time in top_fns:
times_by_module[fn.split('.')[1]] += time
top_modules = sorted(times_by_module.items(), key=operator.itemgetter(1), reverse=True) And got the following results: Times in milliseconds by function:Click to expand
Times in milliseconds by module:[('theta', 5673),
('arima', 4938),
('ces', 4828),
('ets', 4507),
('tbats', 1476),
('garch', 1099),
('models', 286)] So I believe we can migrate them in that order (I already migrated ETS in #757 because I profiled this wrong xD) but we can continue with Theta next. |
Description
We've heavily relied on numba to speed up our models, however we don't have the need for its JIT compilation, since the code that uses it is defined inside the library.
Replacing numba jitted code with a compiled alternative (C++ or Rust for example) would provide the following benefits:
And the following drawbacks:
Use case
This will benefit the development process since even when using the cache it can take a couple of seconds to run jitted functions for the first time.
Also deployments would be smoother because either:
The text was updated successfully, but these errors were encountered: