Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Setting myself some TODOs and clarifying what should be present here for publication.
  • Loading branch information
fishercera committed May 24, 2019
1 parent 8afa755 commit e8fc954
Showing 1 changed file with 17 additions and 0 deletions.
17 changes: 17 additions & 0 deletions README.md
Expand Up @@ -8,6 +8,23 @@ Software used as part of this repo:
- R (Bioconductor packages)


## RNA-Seq read processing pipeline
**TODO** Add scripts for the RNA QC pipeline and a README.
1. Quality trim reads
2. strip PolyA tails
3. align to ribosomal RNA and remove ribosomal reads
4. assemble

Make small test data set

## Annotation and Assembly Refinement
**TODO** Add scripts for using EnTAP to annotate reads, USEARCH to cluster proteomes, and scripts to select nucleotide sequences representative of the clustered proteome (the refined assembly).

## GOSeq Walkthrough
Files include an R Notebook file outlining the steps used to create the background for GoSeq from EnTAP annotations, create the named vectors that GoSeq will use, and run the enrichment analysis (GoSeq_Walkthrough.RMD). Also included are data files needed to run the script, and the output of running the script with those data files. The file GoTermsMap.py is used to create the one-to-one gene id to GO term map from EnTAP annotations.

## TPM Normalization and Between-Species Scaling
Includes an R Notebook file (should be opened in RStudio) explaining the principles and formula for scaling transcripts per million from one species' transcriptome to another's in order to make the gene expression of the two species more fairly comparable. Also includes **TODO** data files to use to test the code and examine the results of scaling.

## MLSeq/PLDA classifier Walkthrough
This procedure uses scaled TPM of two species for shared single-copy orthologues to find the minimum set of genes that are sufficient to distinguish one species from the other. Files include an R Notebook file with the steps involved and code to examine the results of removing biased genes, a standalone R script meant to be run on a highly parallel compute cluster, and a small set of test data.

0 comments on commit e8fc954

Please sign in to comment.