Skip to content

BrainSeq Phase II project lead by LIBD for the BrainSeq Consortium

Notifications You must be signed in to change notification settings

LieberInstitute/brainseq_phase2

Repository files navigation

BrainSeq Phase II analysis

DOI

This repository contains analysis code for the BrainSeq Phase II project from the BrainSeq Consortium carried out by researchers at the Lieber Institute for Brain Development.

If you wish to visualize the eQTL results described in this project, please use the LIBD eQTL browser.

License

Attribution-NonCommercial: CC BY-NC

This license lets others remix, tweak, and build upon our work non-commercially as long as they acknowledge our work.

View License Deed | View Legal Code

Citation

If you use anything in this repository please cite the following publication:

Collado-Torres L, Burke EE, Peterson A, Shin JH, Straub RE, Rajpurohit A, Semick SA, Ulrich WS, BrainSeq Consortium, Price AJ, Valencia C, Tao R, Deep-Soboslay A, Hyde TM, Kleinman JE, Weinberger DR, Jaffe AE. Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia. Neuron. 2019. DOI 10.1016/j.neuron.2019.05.013.

Pre-print: bioRxiv, 2018, DOI 10.1101/426213.

Files

directory contents
brainspan Code for processing the BrainSpan data.
browser Code for creating the files for the eQTL browser. Contains a detailed README file.
bsp1 eQTL replication with BrainSeq Phase I DLPFC polyA+ data. See this README for the results.
caseControl Initial (un-used) exploratory code for the SCZD case-control analysis. Final code is at the qsva_brain repository.
caseControl_HIPPO_checks Code for checking the SCZD case-control HIPPO results. Final code at qsva_brain repo.
casecontrolint Code for the brain region and SCZD diagnosis status interaction DE analysis.
cellComp RNA cell fraction deconvolution. Contains a detailed README file.
check_expr Check genes not expressed at other feature levels. Contains a detailed README file.
check_noQsva Check gene-level SCZD vs control DEG results without adjusting for qSVs. Contains a detailed README file.
check_protein_coding Check for protein coding and non-coding enrichment/depletion. Contains a detailed README file.
check_sex/casecontrol Check SCZD case control results by sex. Contains a detailed README file.
correlation DLPFC and HIPPO expression correlation analyses.
demographics Code for exploring demographic variables such as RIN.
development Code for the DE analysis across age using a linear spline model.
eQTL_GWAS_riskSNPs eQTL analysis using PGC2 GWAS risk SNPs and neighboring SNPs.
eQTL_full Genome wide eQTL analyses.
eQTL_full_GTEx Replication eQTL analyses using GTEx data.
expr_cutoff Code for filtering the features with low expression values and creating the RSE objects used throughout the project.
gtex Code for processing the HIPPO GTEx data and preparing the genotype data for the eQTL analysis.
gtex_both Code for merging the DLPFC and HIPPO GTEx data.
gtex_dlpfc Code for processing the HIPPO GTEx data.
preprocessed_data Code for processing the RNA-seq reads. Uses the LIBD RNA-seq pipeline developed by EE Burke, L Collado-Torres, and AE Jaffe.
region_specific Code for the DE analyses between HIPPO and DLPFC for prenatal and adult controls.
supp_tabs Code for creating some supplementary tables.
twas Perform TWAS analysis using the FUSION TWAS software by Gusev et al., Nature Genetics, 2016. Contains a detailed README file.
misc Files from early explorations including checking for some sample swaps and quality checks.
KCNQ1_snp_check Random check.

Public Files

Supplementary Figures and Tables via Mendelay Data at DOI 10.17632/3j93ybf4md.1.

File ID Details Description JHPCE path URL
f1 details Table S15. Genome wide significant eQTL snp-feature pairs /dcl01/lieber/ajaffe/lab/brainseq_phase2/supp_tabs/SupplementaryTableXX_eQTL.tar.gz AWS
f2 details Unfiltered gene RangedSummarizedExperiment object /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/unfiltered/rse_gene_unfiltered.Rdata AWS
f3 details Unfiltered exon RangedSummarizedExperiment object /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/unfiltered/rse_exon_unfiltered.Rdata AWS
f4 details Unfiltered exon-exon junction RangedSummarizedExperiment object /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/unfiltered/rse_jxn_unfiltered.Rdata AWS
f5 details Unfiltered transcript RangedSummarizedExperiment object /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/unfiltered/rse_tx_unfiltered.Rdata AWS
f6 details DLPFC vs HIPPO DEG objects (adult and prenatal, includes BrainSpan replication and cell RNA fraction sensitivity results) /dcl01/lieber/ajaffe/lab/brainseq_phase2/region_specific/rda/RegionSpecificDEGobjects.tar.gz AWS
f7 details Development DEG objects (includes BrainSpan replication and cell RNA fraction sensitivity results) /dcl01/lieber/ajaffe/lab/brainseq_phase2/development/rda/DevelopmentDEGobjects.tar.gz AWS
f8 details SCZD vs non-psychiatric control DEG objects (includes qSVs as well as results for the interaction and no-qSVA gene-level sensitivity analyses). See BrainSeq Phase I SCZD DE features for more. /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/SCZDvsControlDEGobjects.tar.gz AWS
f9 details Demographic table including cell RNA fraction estimates /dcl01/lieber/ajaffe/lab/brainseq_phase2/cellComp/methprop_pd.Rdata AWS
f10 details TWAS DLPFC weights /dcl01/lieber/ajaffe/lab/brainseq_phase2/twas/DLPFC/DLPFC_weights.tar.gz AWS
f11 details TWAS HIPPO weights /dcl01/lieber/ajaffe/lab/brainseq_phase2/twas/HIPPO/HIPPO_weights.tar.gz AWS
f12 details TWAS results R objects /dcl01/lieber/ajaffe/lab/brainseq_phase2/twas/rda/TWAS_results.tar.gz AWS
f13 details FASTQ files for DLPFC /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/brainseq/dlpfc/merged_fastq/ Globus Endpoint; collection: jhpce#bsp2-dlpfc
f14 details FASTQ files for HIPPO /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/brainseq/dlpfc/merged_fastq/ Globus Endpoint; collection: jhpce#bsp2-hippo
f15 details BAM files for HIPPO and DLPFC /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/DLPFC_RiboZero/HISAT2_out/ and /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/Hippo_RiboZero/HISAT2_out/ Globus Endpoints for DLPFC and HIPPO; collections: jhpce#bsp2-dlpfc-bam and jhpce#bsp2-hippo-bam
f16 details BSP1 re-processed using hg38 at the gene, exon, junction, and transcript expression levels /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/ gene (AWS), exon (AWS), jxn (AWS), tx (AWS)

File details

If the information below is insufficient, check the corresponding scripts or use GitHub's search feature to find where each of the R objects were created. If you have questions about the files, please open an issue.

Common R objects used

SupplementaryTable15_eQTL.tar.gz

f1

Tables with the significant eQTL associations (FDR < 1%) for DLPFC, HIPPO and the brain region interaction (DLPFC vs HIPPO) at the gene, exon, exon-exon junction and transcript expression levels.

Different expression levels and models include other columns depending on what other data was used for replication: GTEx, CAUC only sub-analysis, BrainSeq Phase 1 replication. Though the main ones are described here, and briefly these are:

  • snp: SNP ID.
  • feature_id: expression feature ID. You might also want the EnsemblGeneID column (Ensembl gene ID) or the Symbol one (gene symbol, when available).
  • statistic: eQTL t-statistic computed by MatrixEQTL.
  • pvalue: p-value.
  • FDR: FDR adjusted p-value.
  • beta: eQTL beta coefficient.

For the reference and alternative alleles (note that some variants are insertions), check the newRef and newCount columns respectively in the SNP annotation file BrainSeqPhaseII_snp_annotation.txt (the column names are lower case in that file) that you can match using the snp column.

rse_gene_unfiltered.Rdata

f2

  • Script that created it: export.sh
  • Contents:

TODO

rse_exon_unfiltered.Rdata

f3

  • Script that created it: export.sh
  • Contents:

TODO

rse_jxn_unfiltered.Rdata

f4

  • Script that created it: export.sh
  • Contents:

TODO

rse_tx_unfiltered.Rdata

f5

  • Script that created it: export.sh
  • Contents:

TODO

RegionSpecificDEGobjects.tar.gz

f6

  • Script that created it: export.sh
  • Contents:

TODO

DevelopmentDEGobjects.tar.gz

f7

  • Script that created it: export.sh
  • Contents:

TODO

SCZDvsControlDEGobjects.tar.gz

f8

  • Script that created it: export.sh
  • Contents:

brainseq_phase2_qsvs_age17_noHGold_DLPFC.Rdata

qSVA information for DLPFC (without the 'RiboZero Gold' HIPPO samples, just for file name consistency since no HIPPO samples were considered for this set of qSVs.)

  • JHPCE path: /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/brainseq_phase2_qsvs_age17_noHGold_DLPFC.Rdata
  • Script
  • Contents:
object class description
qsvBonf prcomp qSVs in the original object format
qSVs matrix Matrix of qSVs used for building the model matrices
mod matrix Model matrix without qSVs
modQsva matrix Model matrix with qSVs
keepIndex integer Vector specifying which samples from the full RSE objects to keep
  • Details:
keepIndex :  int [1:379] 1 2 3 4 5 6 7 8 9 10 ...
mod :  num [1:379, 1:13] 1 1 1 1 1 1 1 1 1 1 ...
modQsva :  num [1:379, 1:28] 1 1 1 1 1 1 1 1 1 1 ...
qsvBonf : List of 5
 $ sdev    : num [1:379] 15.07 4.28 2.98 2.49 2.15 ...
 $ rotation: num [1:1000, 1:379] 0.034 0.0494 0.0322 0.0414 0.0319 ...
 $ center  : Named num [1:1000] 5.16 6.6 6.15 6.81 5.47 ...
 $ scale   : logi FALSE
 $ x       : num [1:379, 1:379] -23.32 4.42 14.46 13.68 -22.33 ...
qSVs :  num [1:379, 1:15] -23.32 4.42 14.46 13.68 -22.33 ...
keepIndex mod modQsva qsvBonf qSVs

brainseq_phase2_qsvs_age17_noHGold_HIPPO.Rdata

qSV information for HIPPO after dropping the 'RiboZero Gold' HIPPO samples.

  • JHPCE path: /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/brainseq_phase2_qsvs_age17_noHGold_HIPPO.Rdata
  • Script
  • Contents:
object class description
qsvBonf prcomp qSVs in the original object format
qSVs matrix Matrix of qSVs used for building the model matrices
mod matrix Model matrix without qSVs
modQsva matrix Model matrix with qSVs
keepIndex integer Vector specifying which samples from the full RSE objects to keep
  • Details:
keepIndex :  int [1:333] 454 455 456 457 458 459 460 461 462 463 ...
mod :  num [1:333, 1:13] 1 1 1 1 1 1 1 1 1 1 ...
modQsva :  num [1:333, 1:29] 1 1 1 1 1 1 1 1 1 1 ...
qsvBonf : List of 5
 $ sdev    : num [1:333] 15.09 5.45 4.18 3.05 2.77 ...
 $ rotation: num [1:1000, 1:333] 0.0402 0.051 0.0343 0.0431 0.035 ...
 $ center  : Named num [1:1000] 4.47 5.62 5.34 6.27 4.55 ...
 $ scale   : logi FALSE
 $ x       : num [1:333, 1:333] -5.05 8.6 4.2 17.14 21.68 ...
qSVs :  num [1:333, 1:16] -5.05 8.6 4.2 17.14 21.68 ...
keepIndex mod modQsva qsvBonf qSVs

brainseq_phase2_qsvs_age17_noHGold.Rdata

Joint DLPFC and HIPPO qSVs without the HIPPO 'RiboZero Gold' samples.

  • JHPCE path: /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/brainseq_phase2_qsvs_age17_noHGold.Rdata
  • Script
  • Contents:
object class description
qsvBonf prcomp qSVs in the original object format
qSVs matrix Matrix of qSVs used for building the model matrices
mod matrix Model matrix without qSVs
modQsva matrix Model matrix with qSVs
keepIndex integer Vector specifying which samples from the full RSE objects to keep
  • Details:
keepIndex :  int [1:712] 1 2 3 4 5 6 7 8 9 10 ...
mod :  num [1:712, 1:14] 1 1 1 1 1 1 1 1 1 1 ...
modQsva :  num [1:712, 1:36] 1 1 1 1 1 1 1 1 1 1 ...
qsvBonf : List of 5
 $ sdev    : num [1:712] 18.05 5.07 3.77 3.47 2.34 ...
 $ rotation: num [1:1000, 1:712] 0.0361 0.0499 0.0354 0.0376 0.0375 ...
 $ center  : Named num [1:1000] 4.83 6.14 5.77 6.56 5.04 ...
 $ scale   : logi FALSE
 $ x       : num [1:712, 1:712] -13.9 14.1 23.8 22.8 -12.5 ...
qSVs :  num [1:712, 1:22] -13.9 14.1 23.8 22.8 -12.5 ...
keepIndex mod modQsva qsvBonf qSVs

limma_casecontrol_interaction_exon.Rdata

SCZD case vs neurotypical controls interaction with brain region (DLPFC, HIPPO) analysis results at the exon feature level.

  • JHPCE path: /dcl01/lieber/ajaffe/lab/brainseq_phase2/casecontrolint/rda/limma_casecontrol_interaction_exon.Rdata
  • Script
  • Contents:
object class description
corfit list Output from limma::duplicateCorrelation() for taking into account repeated measures since some brains were sequenced in both DLPFC and HIPPO
fit MArrayLM Output from limma::eBayes() with the DE model results
top data.frame Output from limma::topTable() with the DE results for the interaction term
exprsNorm matrix Normalized expression matrix used for the DE analysis
  • Details:
corfit : List of 3
 $ consensus.correlation: num 0.174
 $ cor                  : num 0.174
 $ atanh.correlations   : num [1:396579] 0.109 0.3918 0.0957 0.1391 0.263 ...
exprsNorm :  num [1:396579, 1:712] -7.127 -1.77 -1.455 -0.978 -0.209 ...
fit : Formal class 'MArrayLM' [package "limma"] with 1 slot
top : 'data.frame':	396579 obs. of  6 variables:
 $ logFC    : num  0.1133 -0.1337 -0.0808 0.097 0.025 ...
 $ AveExpr  : num  -5.61 -1.77 -2.9 -2.38 -1.97 ...
 $ t        : num  0.586 -1.131 -0.603 0.989 0.275 ...
 $ P.Value  : num  0.558 0.259 0.547 0.323 0.783 ...
 $ adj.P.Val: num  0.895 0.749 0.891 0.79 0.958 ...
 $ B        : num  -4.86 -5.08 -5.2 -5.07 -5.56 ...
corfit exprsNorm fit top

limma_casecontrol_interaction_gene.Rdata

SCZD case vs neurotypical controls interaction with brain region (DLPFC, HIPPO) analysis results at the gene feature level.

  • JHPCE path: /dcl01/lieber/ajaffe/lab/brainseq_phase2/casecontrolint/rda/limma_casecontrol_interaction_gene.Rdata
  • Script
  • Contents:
object class description
corfit list Output from limma::duplicateCorrelation() for taking into account repeated measures since some brains were sequenced in both DLPFC and HIPPO
fit MArrayLM Output from limma::eBayes() with the DE model results
top data.frame Output from limma::topTable() with the DE results for the interaction term
exprsNorm matrix Normalized expression matrix used for the DE analysis
  • Details:
corfit : List of 3
 $ consensus.correlation: num 0.275
 $ cor                  : num 0.275
 $ atanh.correlations   : num [1:24652] 0.601 0.298 0.357 0.548 0.824 ...
exprsNorm :  num [1:24652, 1:712] 1.379 0.403 -3.751 0.983 2.263 ...
fit : Formal class 'MArrayLM' [package "limma"] with 1 slot
top : 'data.frame':	24652 obs. of  6 variables:
 $ logFC    : num  0.02036 0.04351 -0.06924 0.00926 -0.03694 ...
 $ AveExpr  : num  0.883 -2.08 -3.167 1.411 1.417 ...
 $ t        : num  0.289 0.362 -0.428 0.152 -0.439 ...
 $ P.Value  : num  0.773 0.717 0.669 0.879 0.661 ...
 $ adj.P.Val: num  0.95 0.937 0.924 0.976 0.922 ...
 $ B        : num  -5.7 -5.16 -5.02 -5.84 -5.76 ...
corfit exprsNorm fit top

limma_casecontrol_interaction_jxn.Rdata

SCZD case vs neurotypical controls interaction with brain region (DLPFC, HIPPO) analysis results at the junction feature level.

  • JHPCE path: /dcl01/lieber/ajaffe/lab/brainseq_phase2/casecontrolint/rda/limma_casecontrol_interaction_jxn.Rdata
  • Script
  • Contents:
object class description
corfit list Output from limma::duplicateCorrelation() for taking into account repeated measures since some brains were sequenced in both DLPFC and HIPPO
fit MArrayLM Output from limma::eBayes() with the DE model results
top data.frame Output from limma::topTable() with the DE results for the interaction term
exprsNorm matrix Normalized expression matrix used for the DE analysis
  • Details:
corfit : List of 3
 $ consensus.correlation: num 0.109
 $ cor                  : num 0.109
 $ atanh.correlations   : num [1:297181] 0.06038 0.04695 0.18061 0.00298 0.02972 ...
exprsNorm :  num [1:297181, 1:712] -4.97 -4.97 -4.97 -4.97 -4.97 ...
fit : Formal class 'MArrayLM' [package "limma"] with 1 slot
top : 'data.frame':	297181 obs. of  6 variables:
 $ logFC    : num  -0.206 -0.3286 -0.0249 -0.3713 0.1121 ...
 $ AveExpr  : num  -4.02 -3.72 -3.35 -4.2 -3.41 ...
 $ t        : num  -1.393 -2.046 -0.131 -2.826 0.677 ...
 $ P.Value  : num  0.16415 0.04113 0.89607 0.00485 0.49882 ...
 $ adj.P.Val: num  0.78 0.599 0.99 0.349 0.923 ...
 $ B        : num  -4.56 -3.72 -5.25 -2.29 -5.1 ...
corfit exprsNorm fit top

limma_casecontrol_interaction_tx.Rdata

SCZD case vs neurotypical controls interaction with brain region (DLPFC, HIPPO) analysis results at the transcript feature level.

  • JHPCE path: /dcl01/lieber/ajaffe/lab/brainseq_phase2/casecontrolint/rda/limma_casecontrol_interaction_tx.Rdata
  • Script
  • Contents:
object class description
corfit list Output from limma::duplicateCorrelation() for taking into account repeated measures since some brains were sequenced in both DLPFC and HIPPO
fit MArrayLM Output from limma::eBayes() with the DE model results
top data.frame Output from limma::topTable() with the DE results for the interaction term
exprsNorm matrix Normalized expression matrix used for the DE analysis
  • Details:
corfit : List of 3
 $ consensus.correlation: num 0.144
 $ cor                  : num 0.144
 $ atanh.correlations   : num [1:92732] 0.356 0.111 0.418 0.315 0.266 ...
exprsNorm :  num [1:92732, 1:712] 1.514 1.791 0.934 0.593 -1 ...
fit : Formal class 'MArrayLM' [package "limma"] with 1 slot
top : 'data.frame':	92732 obs. of  6 variables:
 $ logFC    : num  -0.0749 -0.1086 -0.0715 -0.0267 -0.1306 ...
 $ AveExpr  : num  1.267 1.854 0.601 0.812 -0.251 ...
 $ t        : num  -1.42 -2.15 -1.8 -1.05 -1.71 ...
 $ P.Value  : num  0.1553 0.032 0.0719 0.2932 0.0882 ...
 $ adj.P.Val: num  0.702 0.507 0.597 0.799 0.625 ...
 $ B        : num  -4.76 -3.63 -4.22 -5.16 -4.37 ...
corfit exprsNorm fit top

dxStats_dlpfc_filtered_qSVA_geneLevel_noHGoldQSV_matchDLPFC.rda

DLPFC SCZD case vs neurotypical control DE analysis at the gene level. Also contains results for models that either don't adjust for qSVs or don't adjust for any covariates at all (naive model).

  • JHPCE path: /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/dxStats_dlpfc_filtered_qSVA_geneLevel_noHGoldQSV_matchDLPFC.rda
  • Script
  • Contents:
object class description
outGene data.frame Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
outGene0 data.frame Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model without adjusting for qSVs
outGeneNoAdj data.frame Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model without adjusting for qSVs and any other variables
  • Details:
outGene : 'data.frame':	24652 obs. of  17 variables:
 $ Length      : int  1351 68 284 1982 4039 385 372 1044 1543 89 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
 $ Symbol      : chr  "WASH7P" "MIR6859-1" "" "" ...
 $ EntrezID    : int  NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  1.697 4.355 0.491 1.587 0.733 ...
 $ NumTx       : int  1 1 1 3 5 1 1 1 1 1 ...
 $ gencodeTx   :List of 24652
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  -0.02194 0.00771 0.14248 -0.0459 -0.04052 ...
 $ AveExpr     : num  1.35 -1.68 -2.82 1.88 1.88 ...
 $ t           : num  -0.3856 0.0862 1.112 -0.9079 -0.5623 ...
 $ P.Value     : num  0.7 0.931 0.267 0.365 0.574 ...
 $ adj.P.Val   : num  0.91 0.982 0.661 0.739 0.855 ...
 $ B           : num  -5.82 -5.29 -4.73 -5.62 -5.86 ...
outGene0 : 'data.frame':	24652 obs. of  17 variables:
 $ Length      : int  1351 68 284 1982 4039 385 372 1044 1543 89 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
 $ Symbol      : chr  "WASH7P" "MIR6859-1" "" "" ...
 $ EntrezID    : int  NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  1.697 4.355 0.491 1.587 0.733 ...
 $ NumTx       : int  1 1 1 3 5 1 1 1 1 1 ...
 $ gencodeTx   :List of 24652
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  -0.07 0.0164 0.335 -0.0653 -0.0333 ...
 $ AveExpr     : num  1.35 -1.68 -2.82 1.88 1.88 ...
 $ t           : num  -1.238 0.158 2.634 -1.337 -0.485 ...
 $ P.Value     : num  0.21667 0.87469 0.00878 0.18209 0.62773 ...
 $ adj.P.Val   : num  0.556 0.959 0.109 0.516 0.852 ...
 $ B           : num  -5.4 -5.61 -2.63 -5.36 -6.11 ...
outGeneNoAdj : 'data.frame':	24652 obs. of  17 variables:
 $ Length      : int  1351 68 284 1982 4039 385 372 1044 1543 89 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
 $ Symbol      : chr  "WASH7P" "MIR6859-1" "" "" ...
 $ EntrezID    : int  NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  1.697 4.355 0.491 1.587 0.733 ...
 $ NumTx       : int  1 1 1 3 5 1 1 1 1 1 ...
 $ gencodeTx   :List of 24652
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  -0.0793 0.0646 0.4536 -0.0197 0.0293 ...
 $ AveExpr     : num  1.35 -1.68 -2.82 1.88 1.88 ...
 $ t           : num  -1.377 0.636 3.454 -0.393 0.441 ...
 $ P.Value     : num  0.169352 0.525159 0.000613 0.694524 0.659251 ...
 $ adj.P.Val   : num  0.3082 0.6674 0.00567 0.80175 0.77435 ...
 $ B           : num  -5.24 -5.497 -0.556 -6.156 -6.139 ...
outGene outGene0 outGeneNoAdj

dxStats_dlpfc_filtered_qSVA_noHGoldQSV_matchDLPFC.rda

DLPFC SCZD case vs neurotypical control DE analysis for each of the feature levels (gene, exon, junction, transcript) adjusting for qSVs.

  • JHPCE path: /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/dxStats_dlpfc_filtered_qSVA_noHGoldQSV_matchDLPFC.rda
  • Script
  • Contents:
object class description
outGene data.frame Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
outExon data.frame Output from limma::topTable() with the exon annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
outJxn data.frame Output from limma::topTable() with the junction annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
outTx DataFrame Output from limma::topTable() with the transcript annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
  • Details:
outExon : 'data.frame':	396583 obs. of  17 variables:
 $ Length      : int  37 154 99 147 137 136 198 159 152 34 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000227232.5" "ENSG00000227232.5" "ENSG00000227232.5" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000227232" "ENSG00000227232" "ENSG00000227232" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "unprocessed_pseudogene" "unprocessed_pseudogene" "unprocessed_pseudogene" ...
 $ Symbol      : chr  "WASH7P" "WASH7P" "WASH7P" "WASH7P" ...
 $ EntrezID    : int  NA NA NA NA NA NA NA NA NA NA ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  0.638 2.354 1.577 1.498 2.112 ...
 $ NumTx       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ gencodeTx   :List of 396583
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  -0.0388 -0.0452 -0.0867 -0.0564 -0.0758 ...
 $ AveExpr     : num  -5.42 -1.53 -2.77 -2.2 -1.82 ...
 $ t           : num  -0.248 -0.521 -0.837 -0.753 -1.059 ...
 $ P.Value     : num  0.805 0.603 0.403 0.452 0.29 ...
 $ adj.P.Val   : num  0.962 0.909 0.829 0.852 0.763 ...
 $ B           : num  -4.94 -5.62 -5.11 -5.31 -5.16 ...
outGene : 'data.frame':	24652 obs. of  17 variables:
 $ Length      : int  1351 68 284 1982 4039 385 372 1044 1543 89 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
 $ Symbol      : chr  "WASH7P" "MIR6859-1" "" "" ...
 $ EntrezID    : int  NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  1.697 4.355 0.491 1.587 0.733 ...
 $ NumTx       : int  1 1 1 3 5 1 1 1 1 1 ...
 $ gencodeTx   :List of 24652
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  -0.02194 0.00771 0.14248 -0.0459 -0.04052 ...
 $ AveExpr     : num  1.35 -1.68 -2.82 1.88 1.88 ...
 $ t           : num  -0.3856 0.0862 1.112 -0.9079 -0.5623 ...
 $ P.Value     : num  0.7 0.931 0.267 0.365 0.574 ...
 $ adj.P.Val   : num  0.91 0.982 0.661 0.739 0.855 ...
 $ B           : num  -5.82 -5.29 -4.73 -5.62 -5.86 ...
outJxn : 'data.frame':	297181 obs. of  24 variables:
 $ inGencode     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ inGencodeStart: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ inGencodeEnd  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ gencodeGeneID : chr  NA NA NA NA ...
 $ ensemblID     : chr  NA NA NA NA ...
 $ Symbol        : chr  NA NA NA NA ...
 $ gencodeStrand : chr  NA NA NA NA ...
 $ gencodeTx     :List of 297181
 $ numTx         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Class         : chr  "Novel" "Novel" "Novel" "Novel" ...
 $ startExon     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ endExon       : int  NA NA NA NA NA NA NA NA NA NA ...
 $ newGeneID     : chr  NA NA NA NA ...
 $ newGeneSymbol : chr  NA NA NA NA ...
 $ isFusion      : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ meanExprs     : num  0.5 0.694 1.084 0.659 1.002 ...
 $ Length        : num  100 100 100 100 100 100 100 100 100 100 ...
 $ passExprsCut  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC         : num  0.1667 0.0291 -0.0933 -0.029 -0.1667 ...
 $ AveExpr       : num  -4.35 -3.98 -3.53 -4.49 -3.7 ...
 $ t             : num  1.606 0.245 -0.647 -0.302 -1.38 ...
 $ P.Value       : num  0.109 0.807 0.518 0.763 0.169 ...
 $ adj.P.Val     : num  0.781 0.972 0.959 0.964 0.834 ...
 $ B             : num  -4.29 -5.12 -5 -5.11 -4.51 ...
outTx : Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
outExon outGene outJxn outTx

dxStats_hippo_filtered_qSVA_geneLevel_noHGoldQSV_matchHIPPO.rda

HIPPO SCZD case vs neurotypical control DE analysis at the gene level. Also contains results for models that either don't adjust for qSVs or don't adjust for any covariates at all (naive model).

  • JHPCE path: /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/dxStats_hippo_filtered_qSVA_geneLevel_noHGoldQSV_matchHIPPO.rda
  • Script
  • Contents:
object class description
outGene data.frame Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
outGene0 data.frame Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model without adjusting for qSVs
outGeneNoAdj data.frame Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model without adjusting for qSVs and any other variables
  • Details:
outGene : 'data.frame':	24652 obs. of  17 variables:
 $ Length      : int  1351 68 284 1982 4039 385 372 1044 1543 89 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
 $ Symbol      : chr  "WASH7P" "MIR6859-1" "" "" ...
 $ EntrezID    : int  NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  1.697 4.355 0.491 1.587 0.733 ...
 $ NumTx       : int  1 1 1 3 5 1 1 1 1 1 ...
 $ gencodeTx   :List of 24652
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  0.0326 0.0477 0.2136 -0.0298 -0.0338 ...
 $ AveExpr     : num  0.356 -2.536 -3.559 0.882 0.889 ...
 $ t           : num  0.46 0.396 1.383 -0.514 -0.405 ...
 $ P.Value     : num  0.646 0.692 0.168 0.608 0.685 ...
 $ adj.P.Val   : num  0.913 0.927 0.647 0.901 0.925 ...
 $ B           : num  -5.64 -5.16 -4.49 -5.72 -5.77 ...
outGene0 : 'data.frame':	24652 obs. of  17 variables:
 $ Length      : int  1351 68 284 1982 4039 385 372 1044 1543 89 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
 $ Symbol      : chr  "WASH7P" "MIR6859-1" "" "" ...
 $ EntrezID    : int  NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  1.697 4.355 0.491 1.587 0.733 ...
 $ NumTx       : int  1 1 1 3 5 1 1 1 1 1 ...
 $ gencodeTx   :List of 24652
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  -0.0143 -0.0688 0.3141 -0.0869 -0.0372 ...
 $ AveExpr     : num  0.356 -2.536 -3.559 0.882 0.889 ...
 $ t           : num  -0.195 -0.547 2.019 -1.326 -0.448 ...
 $ P.Value     : num  0.8452 0.5847 0.0443 0.1858 0.6542 ...
 $ adj.P.Val   : num  0.972 0.9 0.452 0.681 0.923 ...
 $ B           : num  -5.81 -5.3 -3.77 -5.08 -5.81 ...
outGeneNoAdj : 'data.frame':	24652 obs. of  17 variables:
 $ Length      : int  1351 68 284 1982 4039 385 372 1044 1543 89 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
 $ Symbol      : chr  "WASH7P" "MIR6859-1" "" "" ...
 $ EntrezID    : int  NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  1.697 4.355 0.491 1.587 0.733 ...
 $ NumTx       : int  1 1 1 3 5 1 1 1 1 1 ...
 $ gencodeTx   :List of 24652
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  -0.0323 -0.021 0.3893 -0.0125 0.0582 ...
 $ AveExpr     : num  0.356 -2.536 -3.559 0.882 0.889 ...
 $ t           : num  -0.477 -0.191 2.778 -0.201 0.777 ...
 $ P.Value     : num  0.6335 0.84875 0.00579 0.84089 0.43771 ...
 $ adj.P.Val   : num  0.7615 0.9091 0.0275 0.9039 0.6007 ...
 $ B           : num  -5.97 -5.65 -2.3 -6.12 -5.86 ...
outGene outGene0 outGeneNoAdj

dxStats_hippo_filtered_qSVA_noHGoldQSV_matchHIPPO.rda

HIPPO SCZD case vs neurotypical control DE analysis for each of the feature levels (gene, exon, junction, transcript) adjusting for qSVs.

  • JHPCE path: /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/dxStats_hippo_filtered_qSVA_noHGoldQSV_matchHIPPO.rda
  • Script
  • Contents:
object class description
outGene data.frame Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
outExon data.frame Output from limma::topTable() with the exon annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
outJxn data.frame Output from limma::topTable() with the junction annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
outTx DataFrame Output from limma::topTable() with the transcript annotation information for the SCZD case vs neurotypical control model adjusting for qSVs
  • Details:
outExon : 'data.frame':	396583 obs. of  17 variables:
 $ Length      : int  37 154 99 147 137 136 198 159 152 34 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000227232.5" "ENSG00000227232.5" "ENSG00000227232.5" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000227232" "ENSG00000227232" "ENSG00000227232" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "unprocessed_pseudogene" "unprocessed_pseudogene" "unprocessed_pseudogene" ...
 $ Symbol      : chr  "WASH7P" "WASH7P" "WASH7P" "WASH7P" ...
 $ EntrezID    : int  NA NA NA NA NA NA NA NA NA NA ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  0.638 2.354 1.577 1.498 2.112 ...
 $ NumTx       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ gencodeTx   :List of 396583
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  0.1345 -0.1166 -0.1364 0.0472 -0.0376 ...
 $ AveExpr     : num  -5.82 -2.06 -3.06 -2.58 -2.14 ...
 $ t           : num  0.87 -1.043 -1.154 0.52 -0.483 ...
 $ P.Value     : num  0.385 0.298 0.249 0.603 0.629 ...
 $ adj.P.Val   : num  0.85 0.807 0.778 0.924 0.931 ...
 $ B           : num  -4.76 -5.12 -4.81 -5.34 -5.47 ...
outGene : 'data.frame':	24652 obs. of  17 variables:
 $ Length      : int  1351 68 284 1982 4039 385 372 1044 1543 89 ...
 $ gencodeID   : chr  "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
 $ ensemblID   : chr  "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
 $ gene_type   : chr  "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
 $ Symbol      : chr  "WASH7P" "MIR6859-1" "" "" ...
 $ EntrezID    : int  NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
 $ Class       : chr  "InGen" "InGen" "InGen" "InGen" ...
 $ meanExprs   : num  1.697 4.355 0.491 1.587 0.733 ...
 $ NumTx       : int  1 1 1 3 5 1 1 1 1 1 ...
 $ gencodeTx   :List of 24652
 $ passExprsCut: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC       : num  0.0326 0.0477 0.2136 -0.0298 -0.0338 ...
 $ AveExpr     : num  0.356 -2.536 -3.559 0.882 0.889 ...
 $ t           : num  0.46 0.396 1.383 -0.514 -0.405 ...
 $ P.Value     : num  0.646 0.692 0.168 0.608 0.685 ...
 $ adj.P.Val   : num  0.913 0.927 0.647 0.901 0.925 ...
 $ B           : num  -5.64 -5.16 -4.49 -5.72 -5.77 ...
outJxn : 'data.frame':	297181 obs. of  24 variables:
 $ inGencode     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ inGencodeStart: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ inGencodeEnd  : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ gencodeGeneID : chr  NA NA NA NA ...
 $ ensemblID     : chr  NA NA NA NA ...
 $ Symbol        : chr  NA NA NA NA ...
 $ gencodeStrand : chr  NA NA NA NA ...
 $ gencodeTx     :List of 297181
 $ numTx         : int  0 0 0 0 0 0 0 0 0 0 ...
 $ Class         : chr  "Novel" "Novel" "Novel" "Novel" ...
 $ startExon     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ endExon       : int  NA NA NA NA NA NA NA NA NA NA ...
 $ newGeneID     : chr  NA NA NA NA ...
 $ newGeneSymbol : chr  NA NA NA NA ...
 $ isFusion      : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ meanExprs     : num  0.5 0.694 1.084 0.659 1.002 ...
 $ Length        : num  100 100 100 100 100 100 100 100 100 100 ...
 $ passExprsCut  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
 $ logFC         : num  -0.0328 -0.1762 -0.1837 -0.24 0.0422 ...
 $ AveExpr       : num  -3.66 -3.41 -3.15 -3.88 -3.08 ...
 $ t             : num  -0.239 -1.222 -1.114 -1.962 0.286 ...
 $ P.Value       : num  0.8116 0.2228 0.266 0.0506 0.7747 ...
 $ adj.P.Val     : num  0.983 0.873 0.89 0.727 0.98 ...
 $ B             : num  -5.13 -4.64 -4.72 -3.86 -5.12 ...
outTx : Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
outExon outGene outJxn outTx

methprop_pd.Rdata

f9

  • Script that created it: export.sh
  • Contents:

TODO

DLPFC_weights.tar.gz

f10

  • Script that created it: export.sh
  • Contents:

TODO

HIPPO_weights.tar.gz

f11

  • Script that created it: export.sh
  • Contents:

TODO

TWAS_results.tar.gz

f12

  • Script that created it: export.sh
  • Contents:

TODO

DLPFC FASTQ

f13

$ du -sh /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/brainseq/dlpfc/merged_fastq/
6.0T	/dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/brainseq/dlpfc/merged_fastq/

These files are named after the SAMPLE_ID CharacterList values stored in the phenotype tables (see any RSE object or the phenotype table). The Globus endpoint also includes the FASTQ files for samples that were excluded by this R script.

HIPPO FASTQ

f14

$ du -sh /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/Hippo_RiboZero/brainseq/hippo/merged_fastq/
5.5T	/dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/Hippo_RiboZero/brainseq/hippo/merged_fastq/

These files are named after the SAMPLE_ID CharacterList values stored in the phenotype tables (see any RSE object or the phenotype table). The Globus endpoint also includes the FASTQ files for samples that were excluded by this R script.

BAM files

f15

$ du -sh /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/HISAT2_out/
6.5T	/dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/HISAT2_out/

BSP1 hg38

f16

This is the BrainSEQ Phase 1 data (DOI: 10.1038/s41593-018-0197-y) re-processed using hg38 (unlike the originally published data using hg19) that was subsetted to the genes, exons, exon-exon junctions, and transcripts expressed in BrainSEQ Phase 2 using the script bsp1/data/subset_bsp1.R.

  • JHPCE path: /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/
  • Contents:
$ du -sh /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1*
2.0G	/dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1_exon.Rdata
157M	/dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1_gene.Rdata
635M	/dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1_jxn.Rdata
373M	/dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1_tx.Rdata

LIBD internal:

JHPCE location: /dcl01/lieber/ajaffe/lab/brainseq_phase2
NOTE: since 2023 the updated internal location is /dcs04/lieber/lcolladotor/BrainSEQ_LIBD001/brainseq_phase2