Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for M4 thinning of timeseries #350

Open
mattijn opened this issue Jul 2, 2023 · 5 comments
Open

support for M4 thinning of timeseries #350

mattijn opened this issue Jul 2, 2023 · 5 comments

Comments

@mattijn
Copy link
Contributor

mattijn commented Jul 2, 2023

I saw this page: https://observablehq.com/@uwdata/m4-scalable-time-series-visualization and saw there is mentioned a SQL friendly version of M4 (I'm more familiar with the term 'thinning' of timeseries).

In Mosaic it is implemented around here in JavaScript, https://github.com/uwdata/mosaic/blob/7eb1ddaae512068fd6bb6cb42e594ef0ec2b1a1c/packages/vgplot/src/marks/ConnectedMark.js#L41-L63 with a reference to this paper https://arxiv.org/pdf/2306.03714.pdf, which also mentions this query.

Is this something that is of interest to VegaFusion in relation to eg the DuckDB engine? It applies to charts (at least line/area) without an aggregation applied.

@jonmmease
Copy link
Collaborator

Thanks for bringing this up @mattijn, I've thought a little bit about this, and I think there's a path to supporting it. There are two subtleties (not blockers) that come up that I've thought of so far:

  1. We need to ensure to only apply this when the the x-axis values are sorted (which is the default, but can be overridden with a custom order)
  2. M4 uses the horizontal screen resolution as input. This is often a fixed number that's available as the width property of the Vega spec. But we'd need to decide what to do when Vega-Lite is configured with width and height of "container".

In terms of implementation, it think what we could do is to implement a new Vega transform named m4 in VegaFusion (and maybe an m4 method on the DataFrame trait). The VegaFusion planner would detect line marks that satisfy that requirements above (sorted x values and fixed width), and add m4 transforms to the server spec.

But actually, come to think of it, the width wouldn't need to be fixed in the case of the widget renderer since the m4 process could re-run on resize.

@mattijn
Copy link
Contributor Author

mattijn commented Jul 2, 2023

I think there is another use-case. Next to the width as input, the m4-method also depends on the min and max of the timestamps in the view.
Timeseries chart often only have the latest period visible (eg. year-to-date), but have a longer history available.

It would be great if it can be applied to this use-case as well.

For example this spec:

import numpy as np
import pandas as pd
import altair as alt
from vega_datasets import data

source = data.sp500()

# full range of available dates: year 2000 to 2010
print('full range:',source.date.min(), source.date.max())

# start chart with x-axis set to last period
# `value` range as datetime in milliseconds since unix epoch
x_init = pd.to_datetime(['2008-12-01', '2010-04-01']).astype(np.int64) / 1E6
interval = alt.selection_interval(encodings=['x'], bind='scales', value={"date":list(x_init)})

title=alt.Title(text=alt.expr(f'''
"FROM " + timeFormat({interval.name}["date"][0], "%B %d, %Y") + 
" TO " + timeFormat({interval.name}["date"][1], "%B %d, %Y")
'''), subtitle='range in view')

# zoom and pan
alt.Chart(source, title=title).mark_line().encode(
    x='date:T',
    y='price:Q'
).properties(
    width=600,
    height=200
).add_params(interval)

view_history

The full range of available datetime is: full range: 2000-01-01 00:00:00 2010-03-01 00:00:00.
Here we initiate the view from 2008-12-01 to 2010-04-01 then zoom out and pan and then zoom in to a random event in early 2002.

For this dataset the time interval is 1 month, so 'thinning' or m4-reduction is not super important, but doing this with vegafusion at interactive rates in the browser with a timeseries having a frequency of e.g. 5 minute would be really great.

@jonmmease
Copy link
Collaborator

Yeah, I think interactive m4 when using the widget renderer should be possible. Agreed that this would be a really great optimization for timeseries line/area visualizations!

@mattijn
Copy link
Contributor Author

mattijn commented Oct 22, 2023

I've been playing a bit with this in combination with duckdb and JupyterChart, no vegafusion included yet. See video:

Screen.Recording.2023-10-22.at.23.50.30.mp4

I only could it yet to work with two charts side by side, where the chart-data on the right is updated based on the interactivity in the chart on the left. The source data frame contains 50M rows, promising results!

For reference, here the notebook how it was done: https://gist.github.com/mattijn/ac749df17bd5ed9c6bdec621f90096b3#file-altair-2023-10-22-am4-thinning-ipynb

@jonmmease
Copy link
Collaborator

This is really cool @mattijn! I hadn't thought of the idea of binding a width param, that's great that this works with JupyterChart. I'd love to have VegaFusion do this automatically some day, but in the end it would pretty much be the logic you've implemented by hand here.

Thinking that this approach could also be used with datashader for other visualization types. In this case the updated mark would be a base64 encoded image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants