Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiP: Next generation Hyperalignment API #319

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

mih
Copy link
Member

@mih mih commented May 18, 2015

This is about making the incomprehensible folding strategy shown here

http://www.pymvpa.org/examples/hyperalignment.html

much more simple to write and read.

The basic idea is to start with a dataset that has other datasets as samples whenever dataset features are not congruent across, e.g. subjects. Such a dataset can have the usual attributes (e.g. chunks) so can be fed into a cross-validation for things like leave-one-subject-out. Any measure running inside the CV can now be prepended with the new, yet-to-be-written Hyperalign() Mapper that can work with these special datasets, computes the common space from the training set, and projects input dataset into the common space when necessary.

In essence this PR will be about implementing this mapper and making various other bits of PyMVPA aware about dataset(datasets) in order to be able to do the right thing (TM).

As this could be implemented in a thousand different ways I am opening this PR right at the start in order to get your comments @yarikoptic @nno and @swaroopgj

mih added 3 commits May 18, 2015 14:32
Not sure if this should be in __init__, or whether we would want a
separate helper function.
@swaroopgj
Copy link
Member

Interesting idea of datasets as samples. Other usecase might be for RSA-like analyses, where subjects need not be in same feature space.

Somewhat related to this, I implemented a Hypearlignment measure for Searchlight Hyperalignment classes. The issues I had to deal with are 1) handling different numbers of features across subjects, 2) feature selection, and 3) excluding certain subjects from training the model.
If these can be handled outside Hyperalignment mapper, it should be less complicated.

@nno
Copy link
Contributor

nno commented May 18, 2015

I'm not so convinced that samples should be used to store a list of datasets.

  • it breaks the nice and consistent 'all .samples are matrices' idea
  • it does not allow for subjects being in a different feature space, as all subjects would share the same .fa (and .a).

Using lists with datasets makes more sense to me.

@mih
Copy link
Member Author

mih commented May 20, 2015

@nno We actually do not rely on the assumption that .samples is a matrix. If you grep for 'streamline' your can find traces of Emanuele's work on representing fiber streamlines in datasets. In that case each fiber had a different length.

PyMVPA's minimal assumption should be that .samples has len(), but other than that most bits should work. Naturally, the more specific functionality is the more assumptions it will have to make and this flexibility breaks down.

The mean advantage of using a dataset as a container instead of a list is that we can attach attributes, use partitioners, ... which is not easily possible with a list.

@swaroopgj

  1. handling different numbers of features across subjects

There would be no restrictions in this regards. And of course no common feature axis.

  1. feature selection

This one probably needs some changes in the code to be handled nicely for such monster datasets.

  1. excluding certain subjects from training the model

A subject would be a sample, assign an attribute and partition/split accordingly.

@nno
Copy link
Contributor

nno commented May 20, 2015

@Hanke thanks for the explanation, you've addressed by concerns.
To make sure we're on the same page: if subj_ds is a dataset with .samples a list of datasets, then subj_ds has minimal (or no) feature attributes (e.g no. .fa.voxel_indices), and feature attributes for each element in subj_ds.samples has the feature attributes for each subject (including .fa.voxel_indices for fMRI datasets). Is that correct?

@mih
Copy link
Member Author

mih commented May 20, 2015

@nno in short: yes.

Long version: the "outer" dataset has .fa but would only have one feature (THE dataset). .fa could be assigned, but I cannot think of a good use case. .samples of the outer dataset is an object array. If you access any element in it, you'll find a fully featured dataset with all the usual feature attributes (like voxel_indices), but also we the freedom to modify it in any way -- like in a list.

@yarikoptic
Copy link
Member

travis blows for a reason:

 File "/home/travis/build/PyMVPA/PyMVPA/mvpa2/mappers/hyperalignment.py", line 13, in <module>

from ..base import Mapper

ImportError: cannot import name Mapper

for i, s in enumerate(samples):
temp[i] = s
samples = temp
del temp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really needed here? isn't that dataset contains only references to the original ones, and temp will simply a lightweight beast so ok to be picked up by gc at any convenience?

yarikoptic and others added 2 commits May 20, 2015 22:29
BF+RF+DOC: use all(iterator) for OPT, fix unused import of Mapper, adjusted docstring
@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 78.83% when pulling 86dba8b on hanke:hypalng into 2e55fd6 on PyMVPA:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants