New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rectY Mark is not updating bin step size as data is filtered #203
Comments
Thanks for raising the issue! You are correct that currently the bins do not update: for now the However, please note that with upcoming travel and other obligations it may be weeks or even 1-2 months before I have time to do this work. In the meantime I'd be happy to help provide guidance in anyone is interested in starting a PR. Here is where the bin transform accesses the mark's statistics: A first question is if we should update these stats dynamically, or develop an alternative abstraction for getting the current field extents. |
Here's my monkeypatch. I passed the This probably isn't ideal, but it works decently in my limited testing. Note that this does result in NaN errors in DuckDB in some cases; I think this happens when the selection causes the filter to return no results and therefore the min/max statistics turn into NaNs somewhere, which then get passed into SQL where they have no meaning. But it doesn't stop my use case from running so I'll leave that solution for someone else to figure out. FilterGroup.js function defaultUpdate(mc, clients, selection) {
return Promise.all(Array.from(clients).map(client => {
const filter = selection.predicate(client);
if (filter != null) {
mc.updateCatalog(client, filter);
return mc.updateClient(client, client.query(filter));
}
}));
} Coordinator.js async updateCatalog(client, filter) {
const { catalog } = this;
// retrieve field statistics
const fields = client.fields();
if (fields?.length) {
client.fieldInfo(await catalog.queryFields(fields, filter));
}
} Catalog.js async queryFields(fields, filter={}) {
const list = await resolveFields(this, fields);
const data = await Promise.all(list.map(f => this.fieldInfo(f, filter)));
return data.filter(x => x);
} Catalog.js async fieldInfo({ table, column, stats }, filter={}) {
const tableInfo = await this.tableInfo(table);
const colInfo = tableInfo[column];
// column does not exist
if (colInfo == null) return;
// no need for summary statistics
if (!stats?.length) return colInfo;
let query = summarize(colInfo, stats)
query.query.where = filter || query.where;
const result = await this.mc.query(
query,
{ persist: true }
);
const info = { ...colInfo, ...(Array.from(result)[0]) };
// coerce bigint to number
for (const key in info) {
const value = info[key];
if (typeof value === 'bigint') {
info[key] = Number(value);
}
}
return info;
} |
I have a data set with long min/max tails that results in
step=100,000
(from thebins()
function inbin.js
). As I filter data with a shared Selection, the min/max tails disappear and the resulting stats should reduce tostep=10,000
or less. However, the Mark retains its original min/max stats for this channel and re-uses these inbin.js
, always resulting in the same (now inappropriate)step=100,000
. My bar plot therefore becomes a single large bar that covers my entire filtered data set.What I expect to happen:
As data is filtered, the min and max statistics for that channel on that Mark should be recalculated.
How do I set the Mark to recalculate min/max for its channel as a result of filtering on the Selection?
So far I've tried updating Mark class with a
filterIndexable
function that always returnstrue
, but this has no effect.The text was updated successfully, but these errors were encountered: