Skip to content

Workflow to identify active transcription factors from CAGE-seq data.

License

Notifications You must be signed in to change notification settings

KevinMenden/tf-activity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TF-activity workflow

Pipeline to infer active transcription factors from CAGE-seq data using transcription factor binding motifs.

Nextflow Docker Repository on Dockerhub Packagist

Description

Given a set of CAGE-peaks of interest (e.g. up-regulated in some condition), this pipeline will extract the genomic sequence in a given range around these peaks and look for enrichment of TF motifs in these sequences. As background either the shuffled input sequences will be used or a background extracted from user supplied CAGE-peaks that are not of interest.

The pipeline will annotate the peaks and create TF-gene mappings. Additionally, the CAGE-peaks are overlapped with ChIP-seq peaks extracted from the ENCODE project.

Usage

As minimum input for this pipeline you will need:

  • CAGE peaks of interest in bed format
  • fasta of reference genome
  • gtf of reference genome

--fasta

Used to specify the path to the reference genome.

--gtf

Used to specify path to the GTF file of the reference genome.

--peaks

Your cage peaks of interest in BED file format.

--background

Peaks that are not differentially expressed can be used here as background peaks. They have to be in BED format just like your peaks of interest.

--pfms

A file containing all the TF motifs to use. This file must be in homer format. If you do not have a file in homer format, you can instead specifiy a TF motif file in Jaspar format using the --pfms_jaspar flag. If none of these two flags is set, the pipeline will use all motifs from the Jaspar core collection.

--encode

A directory containing BED peak files from ChIP-seq experiments with transcription factors. They will be used for intersection. If this flag is not set, this step will be let out.

-profile

Which profile to use. Use docker to use the docker container provided. It is best to create your own profile config file. You can look at the files in /conf for examples, and create one yourself. Then you have to reference it in the nextflow.config file like so:

profiles {

    standard
    {
        includeConfig 'conf/base.config'
    }
    docker
    {
        includeConfig 'conf/base.config'
        includeConfig 'conf/docker.config'
    }
    my_profile
    {
        includeConfig 'conf/base.config'
        includeConfig 'conf/my_profile.config'
    }

}

Additional input options:

Example pipeline call:

nextflow run kevinmenden/tf-activity -profile docker --fasta path/to/genome.fa --gtf path/to/gtf/genome.gtf \
--peaks peaks.bed --background background.bed

In the above example, the pipeline will use the docker image from dockerhub to run. Because no motifs are specified, the default Jaspar core motifs will be used.

About

Workflow to identify active transcription factors from CAGE-seq data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published