Methods for methylation and accessible chromatin region analyses:

Steps

Download the genome fasta file and annotation files and creates the directory structure described in the script, using the file.
Process the NAM genomes
- alters the chromosome names in the NAM genome fasta files
- creates a bedtools genome file for use in bedtools
- creates bed files of 100 bp windows, sliding 20 bp, genome-wide
- identifies all contiguous sequences (contigs) across the chromosomes
- this script works in conjunction with the directory structure created by download_genomes.sh

Scripts:

submit_process_NAM_genomes.sh
process_NAM_genomes.sh
findgaps.py

Additional files:

NAM_founder_genome_list.txt

Additional software:

bioawk 1.0
Biopython 1.68
BEDTools 2.28.0

Process the NAM gene annotations
- converts gff gene annotation to bed file
- extracts gene information
- change chromosome names
- identify the chromosomes and contigs that have annotated genes and those that do not have annotated genes

Scripts:

process_gene_annotations.sh (provided here on github)

Additional files:

NAM_founder_genome_list.txt (provided here on github)

Additional software:

BEDTools 2.28.0

Create Bowtie2 index files for all the NAM founders
- creates a directory for each NAM founder which contains the index files for aligning reads with Bowtie2
- Creates a separate script for each NAM founder and submits it separately

Scripts:

create_bowtie2_indices.NAM.v1.sh (provided here on github)

Additional files:

NAM_founder_genome_list.txt (provided here on github)

Additional software:

Bowtie2 2.3.5.1

Create files of all cytosine coordinates in the NAM genomes
- A separate script is created for each NAM founder and submitted separately
- Split chromosomes into separate files for running fuznuc
- Extract CG, CHG, and CHH coordinates from the genome fasta file
- concatenate the split chromosomes back together to create single genome-wide file for each sequence context and NAM founder
- this concatenated file is sorted and ready to be joined to the methylome data (which is done in a separate script)

Scripts:

Extract_all_Cs_NAM_genomes.v4.sh (provided here on github)

Additional files:

NAM_founder_genome_list.txt

Additional software:

pyfaidx 0.5.5.1
EMBOSS 6.6.0

Processing the methylome CGmap files
- A separate script is created for each tissue and submitted separately.
- the CGmap files are listed in field 5 of the NAM_methylome_list_merge.v2.txt file.
- These CGmap files are located in the CGmap_home directory
- Convert CGmap files to allc bed files
- correct scaffold names to match the reference genome
- Merge the allc file biological replicates into single files
- convert the allc file to a bed file
- split the bed file by sequence contexts
- Create the files containing coverage for all Cs in the genome, split into the three contexts, and also CHGCGN
- Create coverage files for the three contexts

Scripts:

convert_CGmap_to_allc_then_merge_then_convert_bed.v2.sh (provided here on github)

Additional files:

NAM_methylome_list_merge.v2.txt (provided here on github)
CGmap files (generated from earlier steps in the methods)

Additional software:

methylpy 1.3.2
BEDTools 2.28.0
kenttools 1.04.00

Call the unmethylated regions (UMRs)
- Each NAM founder, mapped to both B73 and to its cognate reference genome, are submitted as separate scripts
- submit_UMR_script.v2.sh is used for submitting the separate scripts
- three variables are assigned in submit_UMR_script.v2.sh: $GENOME, $out_dir, and $prefix
- the UMR_algorithm_all_contexts_v1.3.NAM_founders.sh scripts calls the UMRs in each sample
- UMRs are called in both the CHG and CHG/CGN contexts, then merged together.
- UMRs are split into those smaller than and larger than 150 bp
- output file: ${out_dir}/${prefix}.final_UMRs.chg.chgcgn.genic-prox-dist.${size}.bed
- output file description:

      field1: chrom
      field2: UMR start
      field3: UMR end
      field4: methylation count
      field5: read coverage count
      field6: informative cytosine count
      field7: percent methylation
      field8: distance of UMR to nearest annotated gene
      field9: "genic", "proximal", or "distal" designation

Scripts:

submit_UMR_script.v2.sh (provided here on github)
UMR_algorithm_all_contexts_v1.3.NAM_founders.sh (provided here on github)

Additional files:

NAM_methylome_list_merge.v2.txt (provided here on github)
allc.bed files generated in step 6
files of total cytosine coverage, generated in step 6
100 bp sliding window bed files generated in step 2
bedtools genome files generated in step 2
list of contigs lacking genes, generated in step 2
bed files of genome N-gaps, generated in step 2
bed files of gene annotations, generated in step 3

Additional software:

BEDTools 2.28.0

Clean up the UMR files
- Remove the blacklisted regions from the previously called UMRs
- Rename the UMRs to the standardized naming scheme
- Convert the files to the formal BED format

Scripts:

remove_blacklist.change_filenames.convert_bed.NAM_UMRs.v1.sh (provided here on github)

Additional files:

methylomes_table_cross_ref.txt
UMR files generated in step 7
Blacklist files

Additional software:

BEDTools 2.28.0

Calling differentially methylated regions (DMRs) and conserved UMRs
- A separate script is generated and submitted for each NAM line
- The coordinates of UMRs in B73 are split into quarters
- The mCHG data from allc files in each NAM line is mapped to the B73 UMR quarters
- Quarters meeting the mapping criteria are kept
- B73 UMRs with all four quartiles below 20% mCHG or above 50% mCHG are kept, and designated conserved hypomethylated, or DMR, respectively

Scripts:

extract_NAM_DMRs.v1.sh (provided here on github)

Additional files:

NAM_mapping_list.txt (provided here on github)
7-column UMR bed file generated in step 7
meth_*.ref_B73.allc.bed.chg file generated in step 6

Additional software:

BEDTools 2.29.2

The initial ATAC-seq read processing
- Run each ATAC rep (fastq file) through fastp in paired-end mode

Scripts:

ATAC_initial_read_process.NAM.v1.sh (provided here on github)

Additional files:

genome_mapping_combinations.txt (provided here on github)

Additional software:

fastp 0.20.0

Map the ATAC-seq reads with bowtie2
- Align each ATAC-seq replicate to the B73 reference genome and also the cognate reference genome
- Each replicate is submitted as a separate script

Scripts:

ATAC_map_reads.NAM.v1.sh (provided here on github)

Additional files:

genome_mapping_combinations.txt (provided here on github)

Additional software:

Bowtie2 2.3.5.1

Filter the aligned ATAC-seq reads
- This script creates and submits a separate script for each NAM line, mapped to B73 reference and its cognate reference
- Convert sam file to bam file
- remove duplicates
- filter by MAPQ >= 30
- Create 1 bp resolution bigwig file

Scripts:

ATAC_process_bams_v3.sh (provided here on github)

Additional files:

genome_mapping_combinations.txt (provided here on github)

Additional software:

deepTools 3.3.1
SAMtools 1.10
sambamba 0.7.1

Call accessible chromatin regions (ACRs) with MACS2 and quantify their overlap with UMRs
- Call peaks on ATAC data with MACS2 using paired-end mode, q-value cutoff of 0.005, and effective genome size of 1.8e+9 bp
- remove ACRs overlapping blacklisted N-gap regions
- designate ACR as genic, proximal, or distal, depending on proximity to nearest annotated gene
- calculate the percent overlap of ACRs with UMRs

Scripts:

macs2_call_acrs.nam_lines.v5.sh (provided here on github)

Additional files:

genome_mapping_combinations.txt (provided here on github)
UMR bed files generated in step 8
ATAC-seq bam files generated in step 11
list of contigs lacking genes, generated in step 2
bed files of genome N-gaps, generated in step 2
bed files of gene annotations, generated in step 3

Additional software:

MACS2 2.2.7.1
BEDTools 2.29.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Methods for methylation and accessible chromatin region analyses:

Steps

Files

README.md

Latest commit

History

README.md

File metadata and controls

Methods for methylation and accessible chromatin region analyses:

Steps