Skip to content
lawremi edited this page Feb 13, 2011 · 2 revisions

Brushing Model

A brush is a controller that consists of a Selector and an Action. A Selector selects one or more data points, which may be records in a dataset or just a 2D points in the plot. Actions cause changes in the data based on the selection. An Action might modify the plotted dataset or another one.

Selector

A Selector is something that emits selections. It may provide a layer in a plot that listens to mouse events, possibly while drawing some visual cue like a rectangle. There are many other ways to make selections in the GUI: selecting a record in a table, modifying a range slider, etc. One could even imagine a selector hooked up to the data pipeline and selecting based on some set of rules. The Selector outputs a Selection, which is either a list of observations, a point, or a region.

Action

The Action receives a selection and makes some change. Note that there can be many listeners to a selection event. We limit the scope of Action to modifying one or more variables for some selection of observations in one or more datasets.

First, a region/point selection needs to be resolved to a list of observations. This list of observations can be weighted, such as by distance from a selected position in the plot. Categorical “weights” are also possible, e.g., “Full”, “Partial”, “None” selection. We then query one or more Linker objects that return a new list of possibly weighted observations. In the single dataset case, there is usually only one Linker that might bring in additional cases based on the initial selection (the Matching phase in the Wickham/Wills model). Even when linking across datasets (Propagation), the same linker may be appropriate. Its behavior may be slightly different when the source and destination datasets are different. After linking, we have a list of possibly weighted observations for one or more datasets. The final step is to scale the selection to some change in the data (Display). This would delegate to a Scale object.

An important question is how to propagate the brush action. In parallel or in serial? Do the external Linkers take as input the ouput from the internal Linker? Or do all Linkers take the same input? Certainly, it should at least be possible to link externally after the initial local linking. This is the Matching and Propagation separation.

Brushing Model

A brush is a controller that consists of a Selector and an Action. A Selector selects one or more data points, which may be records in a dataset or just a 2D points in the plot. Actions cause changes in the data based on the selection. An Action might modify the plotted dataset or another one.

Selector

A Selector is something that emits selections. It may provide a layer in a plot that listens to mouse events, possibly while drawing some visual cue like a rectangle. There are many other ways to make selections in the GUI: selecting a record in a table, modifying a range slider, etc. One could even imagine a selector hooked up to the data pipeline and selecting based on some set of rules. The Selector outputs a Selection, which is either a list of observations, a point, or a region.

Action

The Action receives a selection and makes some change. Note that there can be many listeners to a selection event. We limit the scope of Action to modifying one or more variables for some selection of observations in one or more datasets.

First, a region/point selection needs to be resolved to a list of observations. This list of observations can be weighted, such as by distance from a selected position in the plot. Categorical “weights” are also possible, e.g., “Full”, “Partial”, “None” selection. We then query one or more Linker objects that return a new list of possibly weighted observations. In the single dataset case, there is usually only one Linker that might bring in additional cases based on the initial selection (the Matching phase in the Wickham/Wills model). Even when linking across datasets (Propagation), the same linker may be appropriate. Its behavior may be slightly different when the source and destination datasets are different. After linking, we have a list of possibly weighted observations for one or more datasets. The final step is to scale the selection to some change in the data (Display). This would delegate to a Scale object.

An important question is how to propagate the brush action. In parallel or in serial? Do the external Linkers take as input the ouput from the internal Linker? Or do all Linkers take the same input? Certainly, it should at least be possible to link externally after the initial local linking. This is the Matching and Propagation separation.

Wickham and Wills Model

There are four steps:

  • Selection
  • Matching selection to points in current dataset
  • Propagation to other datasets
  • Display

Mapping to our proposed model, the first step is our Selector, while the last three constitute the Action.

Selection

A selection can be of the data observations or a data region. This makes a lot of sense. They classify the n-nearest points as an observation brush. It could be viewed as a region/point brush that is then resolved to the n-nearest points.

They bring up “weighted” selection. In our model, the Selector always outputs a binary selection, whereas weighting the selection falls to the Action. Minor clarification.

Matching and Propagation

Matching refers to linking within a dataset, while propagation crosses datasets. At the software level, it seems that both cases can be handled by the same component, the Linker.

Display

The display phase needs to change something about the points: visibility, aesthetic, selected state, etc.

Typical Selection Modes

As noted by Wickham and Wills, a selection can be observation or region based. A mode that supports region selection usually also supports observation selection. This is simply because a region can be easily mapped to the observations within that region. A region selection can be 0D, 1D, 2D or nD (projections).

Most of these work in immediate mode: the action is activated immediately upon input. Multiple range selection typically requires retaining the selection over multiple input events, dispatching only when e.g. some button is pressed.

Observation

  • Hover over object in plot or list
  • Click on object in plot or list
  • Ctrl/shift-click in list/table for multiple selection

Region

  • Hover/click on position in plot (0D) [GGobi Identify]
  • Click-drag on axis (1D, nD)
  • Rubberband selection in list or plot (1D, 2D, nD) [GGobi brush]
  • Range slider (1D)
  • Select categorical levels (combo box, search, etc) (1D)

Typical Linkers

We could classify linkers in many ways. Wickham/Wills bring up cardinality (1-1, 1-many, many-1, many-many). Another important categorization is the type of weighting output by the linker. Unweighted linking is the most common, but continuous weights are useful e.g. for the proximity brush. And categorical weights handle the tristate case of many-1 linking.

Some examples:

  • Relational/SQL linking. Here we mean joining two datasets together. A third table can specify more complex links and include weights. All other linking strategies fall into this one, although one might not actually implement them this way.
  • GGobi categorical linking. Specialization of relational linking.
  • Graphical linking, where edge weights can be selection weights. Great when there is a natural graphical structure.
  • Distance-based linking, the distance as continuous weights. Typically relies on a third dataset (like a graph) to calculate distances across datasets.

Typical Scales

A scale could be implemented as a pipeline that usually feeds off of some plotted dataset but operates in reverse. The reverse pipeline allows transformation of the change before it reaches the root. It needs to map the output of the linker, i.e., the logical selection state or continuous distance, back into the data.

Selection

Set logical attribute on selected point(s) [GGobi Identify]. This could result in an aesthetic change or, for example, a plot could rescale itself to focus on the selected points.

Aesthetic

Usually shape, size, color, etc [GGobi Brush] or even things like position in a graph layout.

Filter

Change a variable considered by a filter.

Software Design

Selector

The Selector should fit the MVC pattern. There is a view of the selection (e.g., the brush rectangle) and event listeners forming the controller. In qtpaint, the layer would play both roles. The underlying model would indicate, e.g., whether a point is selected or not. This makes sense in both the GUI context (QItemSelectionModel, GtkTreeSelection) and in programs like GGobi (pts_under_brush). There would be two types of selection model: region and observation. The selection itself can take many forms: a logical vector, a numeric vector of weights, or a matrix for a region. The initial selection by the user is always unweighted.

Action

In terms of software design, the Action links the Selector, specifically the SelectionModel, with the pipeline. Is the Action just an extension of the pipeline, operating in reverse and branching out to other datasets? There is a certain elegance to this, in that the pipeline for every dataset is merged. The weird thing is that source of the selection is the user, not the underlying data.

A more natural selection pipeline would root at the initial selection, and the Scale would listen to the final selection and pass some sort of aesthetic change off to the data pipeline. The Linker(s) would be implemented in the selection pipeline. The pipeline would be responsible for converting regions to observations and weighting the selection of observations. The resolving and linking is implemented by simple R functions.

                 selection pipline     scale
user-event => region => obs => data1-sel => data1-change
                          `-=> data2-sel => data2-change

We say “pipeline”, but the selection model is too simple to deserve a mutaframe. Of course, the actual data will need to be accessible throughout; it’s simply not clear that all the other variables need to be there. Certainly, this brush pipeline will not be transforming/filtering the data columns. Would something want to view the selection as a variable? It could always be made an attribute in the data.

Brush memory

Could be implemented through a general implementation of pipeline memory. Each operation performed on a pipeline could be serialized, e.g., as R code.

Package

Where does all of this go? A generic brushing package will not completely implement any selectors. The view and controller is the responsibility of the plotting package, GUI, etc. A selection data model is more general than a brushing package. The actions link and scale. Linking seems to fall under the category of general data manipulation and thus could be in plumbr. Scaling to aesthetics might belong in the individual graphics packages, like cranvas, since they all have their own aesthetic rules. The scales package would of course help with this. Thus, it looks like a split between plumbr and cranvas.