Skip to content
Fidel Ramirez edited this page Feb 17, 2015 · 3 revisions

Tutorial

First, each of the fragment pairs is mapped individually. This is to avoid possible bias coming from the alignment software that may try to position read pairs close to each other.

Because a fraction of Hi-C reads contain the ligation site they will not align end-to-end to the reference genome. For this reason, is advisable to use a local alignment instead. In bowtie2 this is achieved by adding the --local parameter. Also, because both FASTQ files corresponding to the pair mates are going to be integrated afterwards it is important to keep the same ordering. In bowtie2 the --reorder parameter has to be given to output the reads in the same order as the FASTQ files.

$ bowtie2  --local --reorder -x genome_index -U R1.fastq.gz 2>> R1.log | samtools view -Shb - > R1.bam
$ bowtie2  --local --reorder -x genome_index -U R2.fastq.gz 2>> R2.log | samtools view -Shb - > R2.bam

A BED file with the coordinates of all positions containing the restriction enzyme motif is required to produce restriction fragment resolution Hi-C matrices . HiCExplorer comes with a command for to find the restriction motifs called findRestSite. For this example the HindIII restriction site AAGCTT is going to be used. The fasta sequence of the genome is needed for input:

$ findRestSite --fasta genome.fa --searchPattern AAGCTT --outFile hindIII.bed 

Now, using the two bam files and the hindIII.bed files a Hi-C matrix is created

$ hicBuildMatrix -s R1.bam R2.bam --outBam R12.bam \
	--restrictionSequence AAGCTT \
	--minDistance 400 \
	--maxDistance 800 \
	--restrictionCutFile hindIII.bed \
	-o hic.npz > hic.log

The --outBam is a file containing all valid Hi-C reads that were used for the matrices. It is useful to inspect the quality of the results. Good Hi-C sequencing data should show a clear enrichment over the restriction sites. The --minDistance is the minimum distance in base pairs between restriction fragment sites. Restriction sites that are closer than this distance are merged into one. A recommended value is half the fragment length used for the library preparation before sequencing. The --maxDistance is the maximum distance in base pairs that a read can be away from a restriction site. Only reads that are within this distance from the restriction site are considered. This value should correspond to the fragment length used for library preparation.

Clone this wiki locally