-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
We should build a polars instrumentation #71
Comments
So we probably hook into |
cc @ritchie46 in case you have any thoughts? |
To be clear that integration doesn't have to be logfire specific in any way. It could just be something like: class PolarsTrace:
def on_start(self, **payload) -> None: ...
def on_end(self, **payload) -> None: ...
class PolarsTracer:
def start_span(self) -> PolarsTrace: ... I guess the question is what goes in |
I am not entirely sure I understand what a trace is exactly yet. Things that might be interesting.
Other than that there is a hook where you can get a hold of the IR post optimization (pola-rs/polars#15972). Though this is very much leaking internals. We add it so that we can hook |
A trace is really similar to a log statement. The only differences is that it has a start and end (and hence a duration) and a context (where in the execution of the program it started and where it ended). The from typing import Any
import polars as pl
class Tracer:
def on_collect(self, input: pl.LazyFrame, output: pl.DataFrame, profile: pl.DataFrame, plan: dict[str, Any]) -> None:
# this would be sent to a remote server not just printed
print(profile)
def _collect_patch(df: pl.LazyFrame, *args: Any, **kwargs: Any) -> pl.DataFrame:
plan = df.serialize()
res, stats = df.profile(*args, **kwargs)
for tracer in pl.tracers:
tracer.on_collect(df, res, stats, plan)
return res
pl.tracers = [Tracer()]
pl.LazyFrame.collect = _collect_patch
df = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}).lazy()
print(df.sort('a').collect()) The idea would be for polars to provide a hook so that we don't need to monkey patch like this. I assume a LazyFrame computes a plan on each call chain (e.g. Is it possible to get similar information for each step of execution to a |
it would give you query/operation times, and the query plan, see this tweet 馃惁 .
Richie says:
The text was updated successfully, but these errors were encountered: