Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] An implementation of discontiguous sampling of the SRerf variant; we call it MTORF(?) #353

Open
wants to merge 47 commits into
base: staging
Choose a base branch
from

Conversation

adam2392
Copy link

@adam2392 adam2392 commented Dec 1, 2020

Summary

@ChesterHuynh and I were interested in extending the SRerf variant that seems work very well on low-sample image datasets to low-sample multivariate-time series (mts). A corresponding issue was created here: adam2392#1 to discuss and design how this might look. This PR addresses the issue raised and implements MTORF(?).

We would love some feedback and potentially get this merged in so that way we can "pip install" this variant.

Details of Implementation

Assuming that mts are structured as (S x T), where S are time series signals and T is time, then MTORF essentially discontiguizes the sampling along the row dimensions, while keeping contiguous chunks in time (T).

@ChesterHuynh did a c++ implementation in the code that is attached and we have been running experiments to further some studies we have. I will summarize them here below.

Studies to Back it up

  1. Simulation of a Multivariate Gaussian With Noisy Samples in Between
    First, we did a simulation study that takes a 3-dim Gaussian and then generate 3 white noise signals. We generate ~1000 samples of each. Then we stack them as such:
signal = 3-dim Gaussian
noise_1 = white noise
noise_2 = white noise
noise_3 = white noise

# this is now a 6 x 1000 array
noisy_signal = np.concatenate((signal[0], noise_1, signal[1], noise_2, signal[2], noise_3), axis=0)

This was the result:
image

This essentially demonstrates when MTORF vs SRERF is desirable. This motivated us to then proceed w/ some real data.

  1. Classification task for epilepsy:
    I used this variant when I set up an epilepsy outcome classification task based on the quantiles of features computed from iEEG data around a seizure onset. It was very helpful because I was able to utilize the fact that my input matrix was correlated in time, but I did not have to impose that each of the quantiles were correlated to its neighboring quantiles (SRerf vs MTORF). This example is a bit difficult to explain, so happy to add more details if desired.

  2. motor decoding from iEEG data:
    Chester and I are currently working on a research project trying to decode motor movements (L, R, Up, Down) from iEEG signals. We hypothesize that a subset of the iEEG data that we recorded is actually useful for decoding movement, and hence the MTORF variant is particularly useful.

Additional Information

Jesse helped me navigate where we might want to make the code change back in Feb 2020(?). Lol sorry for the delay in floating this back up. Jovo initially showed me the SRerf variant during I think a summer workshop he hosted. I prolly should do more tests comparing the different variants, but haven't found the time. I also briefly discussed things w/ Ronan and Hayden a long long time ago, so just trying to get this back on track :p.

Any critiques are appreciated.

ChesterHuynh and others added 30 commits March 18, 2020 16:21
@netlify
Copy link

netlify bot commented Dec 1, 2020

Deploy preview for rerf failed.

Built with commit fcfabaa

https://app.netlify.com/sites/rerf/deploys/5fc67a8a27fa4f00074b09d8

@adam2392
Copy link
Author

adam2392 commented Dec 1, 2020

Currently some tests failed for me due to:

    def test_urerf(projection_matrix):
        n_samples = 100
        n_classes = 2
        X, y = make_blobs(
            n_samples=n_samples, centers=n_classes, n_features=2, random_state=2 ** 4
        )
    
        clf = UnsupervisedRandomForest(projection_matrix=projection_matrix)
        clf.fit(X)
        sim_mat = clf.transform()
    
        assert np.array_equal(sim_mat.diagonal(), np.ones(n_samples))
    
        cluster = AgglomerativeClustering(n_clusters=n_classes).fit(sim_mat)
        predict_labels = cluster.fit_predict(sim_mat)
        score = adjusted_rand_score(y, predict_labels)
>       assert score > 0.9
E       assert 0.48526863084922006 > 0.9

Not sure if this is related to us tho.

} // END randMatStructured


inline void randMatMultivariateTimePatchv2(std::vector<weightedFeature>& featuresToTry, std::vector<std::vector<int> > patchPositions){
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this version can be safely ignored. I need to get rid of this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants