-
Notifications
You must be signed in to change notification settings - Fork 4
/
README_DEMO.NatureProtocols.txt
88 lines (77 loc) · 6.38 KB
/
README_DEMO.NatureProtocols.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
###########################################
README for AQuA-HiChIP code, 2018
by Berkley Gryder, gryderart@gmail.com
###########################################
### Full DEMO: start with step a. - then skip l. and use m. instead
### Partial DEMO: skip to step l. to plot data in R. Contains info to download matrix and mergestat files into HiC-pro mimic folders
### System requirements:
### Full DEMO requirements:
a. Software dependencies to install:
HiC-Pro 2.10.0
bowtie 2-2.3.4
R 3.5.0
openmpi 3.0.0
GSL 2.4
gcc 7.2.0
samtools 1.8
juicer 1.5.6
python 2.7
b. Harware suggestion:
HiC-Pro works well on a cluster node with 4 CPUs, 32 Gb of memory, and 200 Gb scratch disk space
***not tested for operation on a "normal" desktop computer***
c. Installation:
Follow instructions for installation at the following websites:
https://github.com/nservant/HiC-Pro
https://github.com/aidenlab/juicer/wiki/Juicer-Tools-Quick-Start
https://www.rstudio.com/products/rstudio/download/#download
http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#obtaining-bowtie-2
### Partial DEMO requirements: R and R-studio only, can be performed on a "normal" desktop or laptop computer
***CODE IS SENSITIVE TO FOLDER PATHS, WHICH MUST BE UPDATED FROM THE DEMO TO MATCH YOUR LOCAL MACHINE***
d. Download FASTQ files from GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120770
e. Place FASTQ files into directories (2 fastq files per directory) with the following structure:
projects/
├── RH4_H3K27ac_HiChIP_hg19
│ └── DATA
│ ├── Sample_RH4_D6_H3K27ac_HiChIP_HKJ22BGX7
│ └── Sample_RH4_Ent6_H3K27ac_HiChIP_HKJ22BGX7
└── RH4_H3K27ac_HiChIP_mm10
└── DATA
├── Sample_RH4_D6_H3K27ac_HiChIP_HKJ22BGX7
└── Sample_RH4_Ent6_H3K27ac_HiChIP_HKJ22BGX7
***note: this uses the same FASTQ files for hg19 and mm10 in parallel***
f. Prepare configuration file for human and mouse genomes.
DEMO configuration files and chrom*.sizes files here:
https://github.com/GryderArt/AQuA-HiChIP/tree/master/reference_files
g. Prepare bowtie 2 indexes for hg19 and mm19 (http://bowtie-bio.sourceforge.net/tutorial.shtml#newi)
(or, locate local instances of these already in use, and adjust configuration file accordingly)
h. Prepare in silico digested genomes
example code for digesting hg19: /usr/local/Anaconda/envs_app/hicpro/2.10.0/HiC-Pro_2.10.0/bin/utils/digest_genome.py -r dpnii -o dpnii.ucsc.hg19.bed /data/khanlab/projects/HiC/reference_files/bowtie2_index/ucsc.hg19.fasta
i. Run HiC-Pro
-example code for mm10: /usr/local/Anaconda/envs_app/hicpro/2.10.0/HiC-Pro_2.10.0/bin/HiC-Pro -i /data/khanlab/projects/HiC/projects/RH4_H3K27ac_HiChIP_mm10/DATA/ -o /data/khanlab/projects/HiC/projects/RH4_H3K27ac_HiChIP_mm10/HiCpro_OUTPUT/ -c /data/khanlab/projects/HiC/reference_files/config_khanlab_mm10.txt -p
-this will create a shell script, so move folders: cd /data/khanlab/projects/HiC/projects/RH4_H3K27ac_HiChIP_mm10/HiCpro_OUTPUT/
-then, run the shell script with: sbatch -J HiCstep1 --time=24:00:00 --mem=121g --cpus-per-task=4 --gres=lscratch:200 HiCPro_step1_mm10_HiCpro.sh
-once step 1 finishes, then run: sbatch -J HiCstep2 --time=24:00:00 --mem=121g --cpus-per-task=4 --gres=lscratch:200 HiCPro_step2_mm10_HiCpro.sh
j. Convert to juicer compatible .hic format
example code for hg19: /usr/local/Anaconda/envs_app/hicpro/2.10.0/HiC-Pro_2.10.0/bin/utils/hicpro2juicebox.sh -i /data/khanlab/projects/HiC/projects/RH4_H3K27ac_HiChIP_hg19/HiCpro_OUTPUT/hic_results/data/Sample_RH4_D6_H3K27ac_HiChIP_HKJ22BGX7/Sample_RH4_D6_H3K27ac_HiChIP_HKJ22BGX7_allValidPairs -g hg19 -j /usr/local/apps/juicer/juicer-1.5.6/scripts/juicer_tools.jar
k. Extraction of DEMO matrix using the juicer tool "dump"
example code: java -jar /usr/local/apps/juicer/juicer-1.5.6/scripts/juicer_tools.jar dump observed NONE /data/khanlab/projects/HiC/projects/RH4_H3K27ac_HiChIP/HiCpro_OUTPUT/hic_results/data/Sample_RH4_Ent6_H3K27ac_HiChIP_HKJ22BGX7/Sample_RH4_Ent6_H3K27ac_HiChIP_HKJ22BGX7_allValidPairs.hic 11:17600000:17800000 11:17600000:17800000 BP 5000 Sample_RH4_Ent6_H3K27ac_HiChIP_HKJ22BGX7.chr11.176MB-178MB.5KB.matrix.txt
l. Use R-studio to visualize heatmap (Partial DEMO mode)
-download R code here: https://github.com/GryderArt/AQuA-HiChIP/blob/master/plotAQuA_Contactmaps_Virtual4C_DEMO.R
-to run a demo starting with pre-made matrices, create folders to mimic HiC-pro structure:
./projects/HiC/projects/RH4_H3K27ac_HiChIP/HiCpro_OUTPUT/hic_results/data/Sample_RH4_D6_H3K27ac_HiChIP_HKJ22BGX7/
./projects/HiC/projects/RH4_H3K27ac_HiChIP/HiCpro_OUTPUT/hic_results/data/Sample_RH4_Ent6_H3K27ac_HiChIP_HKJ22BGX7/
-into these folders, place sample DEMO matrix downloaded from: https://github.com/GryderArt/AQuA-HiChIP/tree/master/DEMO_data
./projects/HiC/projects/RH4_H3K27ac_HiChIP/HiCpro_OUTPUT/hic_results/data/Sample_RH4_D6_H3K27ac_HiChIP_HKJ22BGX7/Sample_RH4_D6_H3K27ac_HiChIP_HKJ22BGX7.chr11.176MB-178MB.5KB.matrix.txt
./projects/HiC/projects/RH4_H3K27ac_HiChIP/HiCpro_OUTPUT/hic_results/data/Sample_RH4_Ent6_H3K27ac_HiChIP_HKJ22BGX7/Sample_RH4_Ent6_H3K27ac_HiChIP_HKJ22BGX7.chr11.176MB-178MB.5KB.matrix.txt
-to get mm10 and hg19 sample stats in DEMO without running HiC-pro, download file: https://github.com/GryderArt/AQuA-HiChIP/blob/master/DEMO_data/mergestat.HiChIP.all.txt
place mergestat.HiChIP.all.txt in this folder: ./projects/HiC/projects/mergestat.HiChIP.all.txt
-run R-code: Step 1, 2 (mostly commented out for the partial DEMO), 3 and 4 to generate heatmaps with AQuA normalization
-run R-code: Step 5 to demonstrate the Virtual 4C code
-plots should resemble those inline in the Protocol text
m. Use R-studio to visualize heatmap (full DEMO mode)
-download R code here: https://github.com/GryderArt/AQuA-HiChIP/blob/master/plotAQuA_Contactmaps_Virtual4C.R
-run R-code. Step 1 will set up parameters - be sure to update the set working directory command, setwd(), to match your local drive path.
-run R-code. Step 2 will generate stat summary for hg19 and mm10 contact frequencies: ./projects/HiC/projects/mergestat.HiChIP.all.txt
-run R-code: Steps 3 and 4 are used to generate heatmaps with AQuA normalization
-run R-code: Step 5 to demonstrate the Virtual 4C code
-plots should resemble those inline in the Protocol text