Skip to content

File descriptions

Kirstie Whitaker edited this page Sep 21, 2016 · 4 revisions

Detailed descriptions of files in figshare and GitHub repositories

This information is copied and pasted from the supplementary information associated with the NSPN manuscript "Adolescence is associated with genomically patterned consolidation of the hubs of the human brain connectome".

Contents:

Data

All data to replicate the analyses presented here are available for download from the DATA.zip compressed directory contained within the project’s figshare repository: NSPN: Adolescent consolidation of human connectome hubs.

DATA.zip download link: https://dx.doi.org/10.6084/m9.figshare.2057796

This directory has already been downloaded and unzipped into this GitHub repository here.

Specifically, DATA.zip contains:

  • DemographicData.csv: Age, gender, study location (Cambridge or UCL), IQ (where available), handedness (where available), ethnicity and socioeconomic status data for 2436 participants in the NSPN 2k cohort.

  • [COHORT]/PARC_500aparc_[MEASURE]_behavmerge.csv: MRI data for each of the three cohorts (discovery, validation and complete) for each of the 308 regions in the 500mm2 parcellation. Separate files are provided for mean and standard deviation values of MT at each measurement depth. Fractional depths are not represented as measured from the pial surface as in the manuscript rather they are named, as originally calculated, from the grey/white matter boundary. Therefore, for example, the values corresponding to MT at 70% cortical depth for all participants in the discovery cohort are in the file DISCOVERY/PARC_500aparc_MT_mean_projfrac+030_mean_behavmerge.csv.

  • PLS_gene_predictor_vars.csv: Gene expression matrix containing gene-expression for each of 20,737 genes at each of the 306 (of 308) cortical regions that had usable data from the AIBS data set.

    You can match up the names of the regions with the 500.names.txt file in the FS_SUBJECTS/fsaverageSubP/parcellation directory. We excluded subcortical regions for these analyses, so you have to first exclude the first 41 rows in the 500.names.txt file.

    For example, the column in PLS_gene_predictor_vars.csv that starts with 0 corresponds to lh_bankssts_part1, and column 1 corresponds to lh_bankssts_part2.

    You may not spot it, but the column headings in PLS_gene_predictor_vars.csv don't just increase by one. There are two missing indices:

    • 38 : lh_lateraloccipital_part8
    • 213 : rh_parahippocampal_part1

    These are the two regions that did not have usable data from the AIBS dataset and were therefore excluded from the gene analyses.

  • Candidate_genes_oligo.csv: The list of genes from the oligodendrocyte candidate gene set along with their index in the whole-genome gene expression matrix.

  • Candidate_genes_schizophrenia.csv: The list of genes from the schizophrenia-risk candidate gene set along with their index in the whole-genome gene expression matrix.

    The following three files are included because they have been drawn offline and are necessary to create the main manuscript figures using the analysis and figure-generation code provided.

    • CorticalLayers_schematic_cells.jpg: This image is represented in Fig. 2c.
    • CorticalLayers_schematic_methods.jpg: This image is represented in Fig. 2a.
    • Fig3_Enrich_withColourBar.png: This image is represented in Fig. 3g.

Freesurfer Parcellation

The freesurfer parcellation files we used to match up our participants to a standard parcellation are available for download from the FS_SUBJECTS.zip compressed directory contained within the project's figshare repository. Many files are shipped with freesurfer as standard and we provide them here to aid in the reproducibility of our analyses. The files we created are in the parcellation and label directories.

  • 500.names.txt: The list of regions in the parcellation. Note that there are 41 non-cortical regions at the beginning of this file. All analyses removed these 41 regions to leave a list of 308 cortical region.

  • 500.centroids.txt: The x, y, z coordinates of all of the regions in 500.names.txt. (Note again that the first 41 entries are non-cortical and were not included in our analyses).

  • ?h.500.aparc.annot: These are the freesurfer style "annot" files delineating the 308 cortical regions for the left and right hemispheres separately.

Analysis Code

All analysis code necessary to replicate the analyses presented in the paper is available for download from the SCRIPTS.zip compressed directory contained within the project’s figshare repository. NSPN_CorticalMyelination_AnalysisWrapper.py calls all other functions (stored in the SCRIPTS folder) and gives a clear explanation of the analysis steps, including the automated creation of figures and tables.

SCRIPTS.zip download link: https://dx.doi.org/10.6084/m9.figshare.2057805

This directory has already been downloaded and unzipped into this GitHub repository here.

NSPN_CorticalMyelination_AnalysisWrapper.py download link: https://dx.doi.org/10.6084/m9.figshare.2057808

This file has already been downloaded into this GitHub repository here.

Dependencies

The analysis code is dependent on the following software packages:

Name Link Version
FSL http://fsl.fmrib.ox.ac.uk/fsl/fslwiki 5.0.6
Freesurfer http://freesurfer.net freesurfer-Linux-centos4_x86_64-stable-pub-v5.3.0
Pysurfer http://pysurfer.github.io 0.6
Nibabel http://nipy.org/nibabel 1.2.0
Anaconda http://docs.continuum.io/anaconda/index 2.3.0 (64-bit)
Python via Anaconda (above) 2.7.10
Conda via Anaconda (above) 3.15.1
Seaborn http://stanford.edu/~mwaskom/software/seaborn 0.6.0
Networkx https://networkx.github.io 1.9.1
Community http://perso.crans.org/aynaud/communities/index.html 0.3
Statsmodels http://statsmodels.sourceforge.net 0.6.1
Matlab http://uk.mathworks.com 2012b
Matlab statistical and machine learning toolbox http://uk.mathworks.com/products/statistics 2012b
MiKTeX http://miktex.org/ 2.9

Results

The results folder is not necessary and will be created by the analysis code if it does not already exist. However, many of the results presented here rely on permutation tests and therefore results may not be identical across different runs of the same analysis code. We encourage readers to replicate the analyses themselves (as we have done many times) but provide in the project’s figshare repository the specific output files that were used in the creation of this manuscript.

The results folder is very large and exceeds figshare’s file size limits. Therefore, it has been split into four parts within the project’s figshare repository. The discovery, validation and complete analyses are provided separately (DISCOVERY.zip, VALIDATION.zip and COMPLETE.zip), with a compressed folder containing the remaining directories that integrate those results as a fourth file CT_MT_ANALYSES.zip.

The DISCOVERY, VALIDATION and COMPLETE directories should be unzipped so they are inside the CT_MT_ANALYSES directory (which has already been done in this GitHub repository). Combined, these folders contain all the output used to create this manuscript including the tables, high (and low) resolution figures and movies.

CT_MT_ANALYSES.zip download link: https://dx.doi.org/10.6084/m9.figshare.1618815

DISCOVERY.zip download link: https://dx.doi.org/10.6084/m9.figshare.2057811

VALIDATION.zip download link: https://dx.doi.org/10.6084/m9.figshare.2057814

COMPLETE.zip download link: https://dx.doi.org/10.6084/m9.figshare.2057820

These directories have already been downloaded, unzipped and appropriately reorganised into this GitHub repository here.

Supplementary Files

All supplemental files have been uploaded to the project’s figshare repository as SUPPLEMENTAL_FILES.zip.

SUPPLEMENTAL_FILES.zip download link: https://dx.doi.org/10.6084/m9.figshare.1618810

This directory has already been downloaded and unzipped into this GitHub repository here.

Specifically, SUPPLEMENTAL_FILES.zip contains:

  • WhitakerVertes_PLSEnrichmentGeneList.xlsx: This file contains the list of significant enrichment terms for both the most positively weighted (up-regulated) genes and the most negatively weighted (down-regulated) genes in the second PLS component (PLS2) separately for the discovery, validation and complete cohorts. The enrichment of down-regulated genes was obtained by providing the inverse ranking of genes to the GOrilla software tool. Redundant terms, as determined by the REVIGO online software tool, are shaded in grey in the supplementary file, to highlight the most meaningful GO annotations. We also shaded out terms annotated to over 1000 genes given their generality (for example “cell communication”).

  • GO_[COHORT]_PLS2_[DIR].png: These six figures are very large and provide a detailed visualization of all significantly enriched biological processes embedded in the hierarchical tree of GO terms for PLS2. For example, GO_complete_PLS2_pos.png is the original tree (including process labels) presented in Fig. 3g and represents up-regulated biological processes in PLS2 in the complete cohort. GO_complete_PLS2_neg.png represents down-regulated biological processes in PLS2 in the complete cohort. The colour-coding of boxes represents the degree of significance of each term based on its uncorrected P-values: white (P > 10-3), yellow (10-3 < P < 10-5), pale orange (10-5 < P < 10-7), orange (10-7 < P < 10-9), red (P < 10-9). Note that the colourbars for the corresponding figure in the main text have been adapted to represent FDR corrected P values and the file WhitakerVertes_PLSEnrichmentGeneList.xlsx reports both FDR corrected and uncorrected P values.

  • RegionalMeasures_[COHORT].pdf: These three tables present summary measures of baseline CT, baseline MT, ΔCT with age, ΔMT with age, PLS2 weightings, degree and closeness for the discovery, validation and complete cohorts. The P values are not corrected for multiple comparisons but presented in order to assess the patterns in the data. All MT values in this table are sampled at 70% cortical depth. Values were calculated based on 308 regions but for readability we present median values from all sub-regions within each of the 34 Desikan-Killiany atlas regions. Tables separated by hemisphere (68 regions) and for all 308 regions are available in the DISCOVERY.zip, VALIDATION.zip and COMPLETE.zip compressed directories in the project’s figshare repository, within the TABLES subdirectory for each cohort.