-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WiP: Next generation Hyperalignment API #319
base: master
Are you sure you want to change the base?
Conversation
Not sure if this should be in __init__, or whether we would want a separate helper function.
Interesting idea of datasets as samples. Other usecase might be for RSA-like analyses, where subjects need not be in same feature space. Somewhat related to this, I implemented a Hypearlignment measure for Searchlight Hyperalignment classes. The issues I had to deal with are 1) handling different numbers of features across subjects, 2) feature selection, and 3) excluding certain subjects from training the model. |
I'm not so convinced that samples should be used to store a list of datasets.
Using lists with datasets makes more sense to me. |
@nno We actually do not rely on the assumption that .samples is a matrix. If you grep for 'streamline' your can find traces of Emanuele's work on representing fiber streamlines in datasets. In that case each fiber had a different length. PyMVPA's minimal assumption should be that .samples has len(), but other than that most bits should work. Naturally, the more specific functionality is the more assumptions it will have to make and this flexibility breaks down. The mean advantage of using a dataset as a container instead of a list is that we can attach attributes, use partitioners, ... which is not easily possible with a list.
There would be no restrictions in this regards. And of course no common feature axis.
This one probably needs some changes in the code to be handled nicely for such monster datasets.
A subject would be a sample, assign an attribute and partition/split accordingly. |
@Hanke thanks for the explanation, you've addressed by concerns. |
@nno in short: yes. Long version: the "outer" dataset has .fa but would only have one feature (THE dataset). .fa could be assigned, but I cannot think of a good use case. .samples of the outer dataset is an object array. If you access any element in it, you'll find a fully featured dataset with all the usual feature attributes (like voxel_indices), but also we the freedom to modify it in any way -- like in a list. |
travis blows for a reason:
|
for i, s in enumerate(samples): | ||
temp[i] = s | ||
samples = temp | ||
del temp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really needed here? isn't that dataset contains only references to the original ones, and temp will simply a lightweight beast so ok to be picked up by gc at any convenience?
BF+RF+DOC: use all(iterator) for OPT, fix unused import of Mapper, adjusted docstring
This is about making the incomprehensible folding strategy shown here
http://www.pymvpa.org/examples/hyperalignment.html
much more simple to write and read.
The basic idea is to start with a dataset that has other datasets as samples whenever dataset features are not congruent across, e.g. subjects. Such a dataset can have the usual attributes (e.g. chunks) so can be fed into a cross-validation for things like leave-one-subject-out. Any measure running inside the CV can now be prepended with the new, yet-to-be-written Hyperalign() Mapper that can work with these special datasets, computes the common space from the training set, and projects input dataset into the common space when necessary.
In essence this PR will be about implementing this mapper and making various other bits of PyMVPA aware about dataset(datasets) in order to be able to do the right thing (TM).
As this could be implemented in a thousand different ways I am opening this PR right at the start in order to get your comments @yarikoptic @nno and @swaroopgj