Multivariate Data Split - Program to create subsets of a data sets that are likely to be not statistically significantly different.
Subsets are picked based on the psuedo gower distance, where the distance between subjects picked in each iteration is minimized.
a) phenotype file (only numerical values, i.e. please convert information such as sex (M/F) into 0/1)
b) phenotype list to be used for comparison (see example phenotype_list.csv file)
c) Number of subsets
CSV file with added column including subset IDs.
./subset_data.py INFILE OUTFILE NUM_GROUPS (PHENOTYPE_LIST)
./subset_data.py phenotypes.csv phenotypes_split.csv 5 phenotype_list.csv
numpy, scipy
Schirmer et al. "Rich-Club organization: an important determinant of functional outcome after acute ischemic stroke." BioRxiv (2019): 545897.