WiP: Next generation Hyperalignment API #319

mih · 2015-05-18T14:18:11Z

This is about making the incomprehensible folding strategy shown here

http://www.pymvpa.org/examples/hyperalignment.html

much more simple to write and read.

The basic idea is to start with a dataset that has other datasets as samples whenever dataset features are not congruent across, e.g. subjects. Such a dataset can have the usual attributes (e.g. chunks) so can be fed into a cross-validation for things like leave-one-subject-out. Any measure running inside the CV can now be prepended with the new, yet-to-be-written Hyperalign() Mapper that can work with these special datasets, computes the common space from the training set, and projects input dataset into the common space when necessary.

In essence this PR will be about implementing this mapper and making various other bits of PyMVPA aware about dataset(datasets) in order to be able to do the right thing (TM).

As this could be implemented in a thousand different ways I am opening this PR right at the start in order to get your comments @yarikoptic @nno and @swaroopgj

Not sure if this should be in __init__, or whether we would want a separate helper function.

swaroopgj · 2015-05-18T16:54:15Z

Interesting idea of datasets as samples. Other usecase might be for RSA-like analyses, where subjects need not be in same feature space.

Somewhat related to this, I implemented a Hypearlignment measure for Searchlight Hyperalignment classes. The issues I had to deal with are 1) handling different numbers of features across subjects, 2) feature selection, and 3) excluding certain subjects from training the model.
If these can be handled outside Hyperalignment mapper, it should be less complicated.

nno · 2015-05-18T17:03:37Z

I'm not so convinced that samples should be used to store a list of datasets.

it breaks the nice and consistent 'all .samples are matrices' idea
it does not allow for subjects being in a different feature space, as all subjects would share the same .fa (and .a).

Using lists with datasets makes more sense to me.

mih · 2015-05-20T08:52:40Z

@nno We actually do not rely on the assumption that .samples is a matrix. If you grep for 'streamline' your can find traces of Emanuele's work on representing fiber streamlines in datasets. In that case each fiber had a different length.

PyMVPA's minimal assumption should be that .samples has len(), but other than that most bits should work. Naturally, the more specific functionality is the more assumptions it will have to make and this flexibility breaks down.

The mean advantage of using a dataset as a container instead of a list is that we can attach attributes, use partitioners, ... which is not easily possible with a list.

@swaroopgj

handling different numbers of features across subjects

There would be no restrictions in this regards. And of course no common feature axis.

feature selection

This one probably needs some changes in the code to be handled nicely for such monster datasets.

excluding certain subjects from training the model

A subject would be a sample, assign an attribute and partition/split accordingly.

nno · 2015-05-20T09:09:57Z

@Hanke thanks for the explanation, you've addressed by concerns.
To make sure we're on the same page: if subj_ds is a dataset with .samples a list of datasets, then subj_ds has minimal (or no) feature attributes (e.g no. .fa.voxel_indices), and feature attributes for each element in subj_ds.samples has the feature attributes for each subject (including .fa.voxel_indices for fMRI datasets). Is that correct?

mih · 2015-05-20T09:18:14Z

@nno in short: yes.

Long version: the "outer" dataset has .fa but would only have one feature (THE dataset). .fa could be assigned, but I cannot think of a good use case. .samples of the outer dataset is an object array. If you access any element in it, you'll find a fully featured dataset with all the usual feature attributes (like voxel_indices), but also we the freedom to modify it in any way -- like in a list.

yarikoptic · 2015-05-21T02:12:19Z

travis blows for a reason:

 File "/home/travis/build/PyMVPA/PyMVPA/mvpa2/mappers/hyperalignment.py", line 13, in <module>

from ..base import Mapper

ImportError: cannot import name Mapper

yarikoptic · 2015-05-21T02:17:39Z

mvpa2/base/dataset.py

+                for i, s in enumerate(samples):
+                    temp[i] = s
+                samples = temp
+                del temp


is it really needed here? isn't that dataset contains only references to the original ones, and temp will simply a lightweight beast so ok to be picked up by gc at any convenience?

…justed docstring

BF+RF+DOC: use all(iterator) for OPT, fix unused import of Mapper, adjusted docstring

coveralls · 2015-06-03T17:16:02Z

Coverage increased (+0.06%) to 78.83% when pulling 86dba8b on hanke:hypalng into 2e55fd6 on PyMVPA:master.

mih added 3 commits May 18, 2015 14:32

PEP8: nothing more

037f252

ENH: Enable dataset(datasets)

17cfba6

Not sure if this should be in __init__, or whether we would want a separate helper function.

Starting point for a hyperalignment mapper

f065f36

yarikoptic reviewed May 21, 2015
View reviewed changes

yarikoptic and others added 2 commits May 20, 2015 22:29

BF+RF+DOC: use all(iterator) for OPT, fix unused import of Mapper, ad…

5126782

…justed docstring

Merge pull request #13 from yarikoptic/pr-319

86dba8b

BF+RF+DOC: use all(iterator) for OPT, fix unused import of Mapper, adjusted docstring

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WiP: Next generation Hyperalignment API #319

WiP: Next generation Hyperalignment API #319

mih commented May 18, 2015

swaroopgj commented May 18, 2015

nno commented May 18, 2015

mih commented May 20, 2015

nno commented May 20, 2015

mih commented May 20, 2015

yarikoptic commented May 21, 2015

yarikoptic May 21, 2015

coveralls commented Jun 3, 2015

WiP: Next generation Hyperalignment API #319

Are you sure you want to change the base?

WiP: Next generation Hyperalignment API #319

Conversation

mih commented May 18, 2015

swaroopgj commented May 18, 2015

nno commented May 18, 2015

mih commented May 20, 2015

nno commented May 20, 2015

mih commented May 20, 2015

yarikoptic commented May 21, 2015

yarikoptic May 21, 2015

Choose a reason for hiding this comment

coveralls commented Jun 3, 2015