16p_resource

Introduction

This repo contains the code used to generate analyses and generate figures for Roth, Muench et. al. This paper describes a new resource of patient-derived iPSCs bearing a 16p11.2 copy number variant, explores the potential utility of these clones, and describes the possible impact of clonal integration on iPSC-derived tissue models. I have written this README with other biologists in mind who might be interested in following up on our analyses or investigating their own integration effects.

It is divided into two sections. The names are a bit of a misnomer, and left over from an earlier revision:

"figure5": contains a differential expression analysis of the integration-negative clones aligned with STAR and counted with htseq-count.
"figure6": contains an independent bioinformatic comparison of integration-negative and Integration-positive clones aligned using kallisto.

Getting Started

Data

The data will be made available on GEO (under embargo during revisions as of October 12, 2020).

Dependencies

Figure 5

setup.Rmd

Install the following

DESeq.Rmd

In addition to the packages required for Setup, install the following
- R ColorBrewer

Figure 6

tximport_Setup.Rmd

Install the following

deseq.Rmd

Install the following
- DESeqAid Package
- DESeq2 Package

heatmaps.Rmd

Install the following
- R ColorBrewer

barPlots.Rmd

In addition to the packages required for Setup, install the following
- reshape2

GSEA.Rmd

GSEA .jar file

How to run

1. Fill out userVars.csv.

I thought it might be easier to import and document variables using this spreadsheet rather than using a .bashrc file.

2. Run the Rmd files.

Within each figure directory, the code has been broken up into several parts. You should run the code in this order:

Figure 5

setup.Rmd
deseq.Rmd

Figure 6

tximport_setup.Rmd
deseq.Rmd
barPlots.Rmd OR heatmaps.Rmd OR GSEA.Rmd

This code is written to have a separate output file for each distinct date of run, when the date of run is defined within the userVars.csv file. This way, the user can maintain copies of all output as small tweaks are made to the code.

Additional Info

For the alignment and counting steps, I used one of two different aligners

STAR (Figure 5) with htseq-count
kallisto (Figure 6)

I performed both of these on the Stanford Center for Personalized Medicine Cluster. I recommend running STAR on a cluster. In theory, you should be able to run kallisto on a laptop.

I performed subsequent analyses using R and RStudio.

Versioning

For the versions available, see the tags on this repository.

Authors

Kristin Muench - GitHub: kmuench

Acknowledgments

Thank you to PurpleBooth for the README template
Thank you to the Bader Lab for their GSEA tutorial.
Thank you to John Hanks at the SCPGM cluster and the team at the Stanford Functional Genomics Facility for their help supporting this work.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
resources		resources
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resources

resources

scripts

scripts

README.md

README.md

Repository files navigation

16p_resource

Introduction

Getting Started

Data

Dependencies

Figure 5

Figure 6

How to run

1. Fill out userVars.csv.

2. Run the Rmd files.

Additional Info

Versioning

Authors

Acknowledgments

About

Releases 1

Packages

Languages

kmuench/16p_resource

Folders and files

Latest commit

History

Repository files navigation

16p_resource

Introduction

Getting Started

Data

Dependencies

Figure 5

Figure 6

How to run

1. Fill out userVars.csv.

2. Run the Rmd files.

Additional Info

Versioning

Authors

Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages