Data set #126

rehamFatima · 2018-06-26T09:00:10Z

Hi,

I have been trying to run your pipeline on the mm10 data set (as setup by running install_genome_data.sh ). As there were no fast files, I made a bam file from the bed files and am running the pipeline using the following command :

$ bds atac.bds -species mm10 -gensz mm -bam1 ../genomes/mm10_enh_dhs.bam -chrsz ../genomes/mm10/mm10.chrom.sizes

I am not getting the output repository structure as mentioned on your main page, nor a Report.html file. If you could help with this, I would highly appreciate it.

Many thanks.
Kind Regards,
Reham

leepc12 · 2018-06-26T18:44:41Z

If you don't explicitly specify output directory with -out_dir [SOMEWHERE] then out/ will be generated on the working directory where you ran the pipeline command. Also skip -gensz mm. it's redundant.

rehamFatima · 2018-06-27T08:36:19Z

out/ is generated, but it does not have all the files as listed on your main page. Could you please tell me what data have you used to generate all those files ? I am not quite sure whether the discrepancy is in the data set I am using or something is missing from the pipeline installation.

Thank you

rehamFatima · 2018-06-29T15:17:37Z

Could you also please tell me which fastqs you have used as part of running the pipeline ? as the data set coming from the install_genome_data.sh only contains fasta and bed files.

Many thanks

rehamFatima · 2018-06-29T15:19:00Z

I have tried running it with different fastqs from various sources, but that crashes down with Error (modules/align_bowtie2.bds, line 44, pos 3): Bowtie2 index (-bwt2_idx) doesn't exists! (file: .1.bt2 or .1.bt2l)

leepc12 · 2018-06-29T15:41:51Z

Did you specify species (hg38, mm10, ...) by -species [SPECIES]? Did you install genome data correctly? Did you see a "success" message at the end of the genome data installation shell script?

rehamFatima · 2018-07-02T13:13:30Z

Thanks Jin, yes I have specified -species mm10 . I have reinstalled the genome data and did see the "success" message. It still does not contain any fastq files. Is it supposed to or are the fastq files to be retrieved from elsewhere ?

leepc12 · 2018-07-02T13:42:15Z

Please post a full log (including Bowtie2 index (-bwt2_idx) doesn't exists!) for debugging info. Also, what is the full command line that you used for running a pipeline?

rehamFatima · 2018-07-02T15:22:32Z

The command is e.g :

bds atac.bds -species mm10 -se -gensz mm -fastq1 ../genomes/mm10_no_alt_analysis_set_ENCODE.fastq.gz -chrsz ../genomes/mm10/mm10.chrom.sizes

and the output is :

Picked up _JAVA_OPTIONS: -Xms8G -Xmx8G -XX:ParallelGCThreads=1


== git info
Latest git commit		: d0babc6d2cc0aa8038c5ad35a41ed9876f49619d (Tue Jun 5 18:40:20 2018)
	Reading parameters from section (default) in file(/homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env)...


== configuration file info
Hostname			: ebi5-164.ebi.ac.uk
Configuration file		: 
Environment file		: /homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env


== parallelization info
No parallel jobs		: false
Maximum # threads 		: 8


== cluster/system info
Walltime (general)		: 5h50m
Max. memory (general)		: 7G
Force to use a system		: local
Process priority (niceness)	: 0
Retiral for failed tasks	: 0
Submit tasks to a cluster queue	: 
Unlimited cluster mem./walltime	: false
Use --acount instead of SLURM partition		: false
Java temporary directory		: ${TMPDIR}


== shell environment info
Conda env. 			: bds_atac
Conda env. for python3		: bds_atac_py3
Conda bin. directory		: 

Shell cmd. for init.		: if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi;  export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for init.(py3)	: if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi;  export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for fin.		: TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0

Cluster task min. len.		: 60

Cluster task delay			: 0


== output directory/title info
Output dir.			: /homes/reham/ATAC-Seq/atac_dnase_pipelines/out
Title (prefix)			: atac_dnase_pipelines
	Reading parameters from section (default) in file(/homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env)...


== species settings
Species				: mm10
Species file			: 

Species name (WashU browser)	: mm10
Ref. genome seq. fasta		: 
Chr. sizes file			: ../genomes/mm10/mm10.chrom.sizes
Black list bed			: 
Ref. genome seq. dir.		: 


== ENCODE accession settings
ENCODE experiment accession	: 
ENCODE award RFA			: 
ENCODE assay category		: 
ENCODE assay title		: 
ENCODE award				: 
ENCODE lab				: 
ENCODE assembly genome		: 
ENCODE alias prefix		: KLAB_PIPELINE
ENCODE alias suffix		: 


== report settings
URL root for output directory	: 
Genome coord. for browser tracks	: 


== align multimapping settings
# alignments reported for multimapping	: 0


== align bowtie2 settings
Bowtie2 index			: 
Replacement --score-min for bowtie2	: 
Walltime (bowtie2)		: 47h
Max. memory (bowtie2)		: 12G
Extra param. (bowtie2)		: 
Disable index on memory (bowtie2)	: false


== adapter trimmer settings
Maximum allowed error rate for cutadapt	: 0.10
Minimum trim. length for cutadapt -m	: 5
Walltime (adapter trimming)		: 23h
Max. memory (adapter trimming)		: 12G


== postalign bam settings
MAPQ reads rm thresh.		: 30
Rm. tag reads with str.		: chrM
No dupe removal in filtering raw bam	: false
Walltime (bam filter)		: 23h
Max. memory (bam filter)	: 12G
Dup marker				: picard
Use sambamba markdup (instead of picard)	: false


== postalign bed/tagalign settings
Max. memory for UNIX shuf			: 12G
No --random-source for UNIX shuf		: false


== postalign cross-corr. analysis settings
Max. memory for UNIX shuf			: 12G
User-defined cross-corr. peak strandshift	: -1
Extra parameters for cross-corr. analysis	: 
Max. memory for cross-corr. analysis		: 15G
Stack size for cross-corr. analysis		:


== callpeak macs2 settings
Genome size (hs,mm)		: mm
Walltime (macs2)		: 23h
Max. memory (macs2)		: 15G
Cap number of peaks (macs2)	: 300K
Extra parameters for macs2 callpeak		: 


== callpeak naiver overlap settings
Bedtools intersect -nonamecheck	: false


== IDR settings
Append IDR threshold to IDR out_dir	: false


== ATAQC settings
TSS enrichment bed		: 
DNase bed for ataqc		: 
Promoter bed for ataqc		: 
Enhancer bed for ataqc		: 
Reg2map for ataqc			: 
Reg2map_bed for ataqc		: 
Roadmap metadata for ataqc	: 
Max. memory for ATAQC			: 20G
Walltime for ATAQC			: 47h


== atac pipeline settings
Type of pipeline			: atac-seq
Align only				: false
# reads to subsample replicates (0 if no subsampling)	: 0
# reads to subsample for cross-corr. analysis 	: 25000000
No pseudo replicates			: false
No ATAQC (advanced QC report)		: false
No Cross-corr. analysis			: false
Use CSEM for alignment			: false
Smoothing window for MACS2		: 150
DNase Seq				: false
IDR threshold				: 0.1
Force to use ENCODE3 parameter set	: false
Force to use ENCODE parameter set	: false
Disable genome browser tracks	: false
p-val thresh. for overlapped peaks	: 0.01
MACS2 p-val thresh. for peaks	: 0.01
MACS2 p-val thresh. for BIGWIGs		: 0.01
Enable IDR on called peaks	: false
Automatically find/trim adapters	: false

== checking atac parameters ...
00:00:00.925	Error (modules/align_bowtie2.bds, line 44, pos 3): Bowtie2 index (-bwt2_idx) doesn't exists! (file: .1.bt2 or .1.bt2l)

leepc12 · 2018-07-02T15:28:29Z

Please do the following for more info: ``` $ cat /homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env $ ls -l /homes/reham/ATAC-Seq/genomes/mm10 $ ls -l /homes/reham/ATAC-Seq/genomes ``` Jin ```

…

On Mon, Jul 2, 2018 at 8:22 AM, rehamFatima ***@***.***> wrote: The command is e.g : bds atac.bds -species mm10 -se -gensz mm -fastq1 ../genomes/mm10_no_alt_analysis_set_ENCODE.fastq.gz -chrsz ../genomes/mm10/mm10.chrom.sizes and the output is : Picked up _JAVA_OPTIONS: -Xms8G -Xmx8G -XX:ParallelGCThreads=1 == git info Latest git commit : d0babc6 (Tue Jun 5 18:40:20 2018) Reading parameters from section (default) in file(/homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env)... == configuration file info Hostname : ebi5-164.ebi.ac.uk Configuration file : Environment file : /homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env == parallelization info No parallel jobs : false Maximum # threads : 8 == cluster/system info Walltime (general) : 5h50m Max. memory (general) : 7G Force to use a system : local Process priority (niceness) : 0 Retiral for failed tasks : 0 Submit tasks to a cluster queue : Unlimited cluster mem./walltime : false Use --acount instead of SLURM partition : false Java temporary directory : ${TMPDIR} == shell environment info Conda env. : bds_atac Conda env. for python3 : bds_atac_py3 Conda bin. directory : Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s) Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s) Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0 Cluster task min. len. : 60 Cluster task delay : 0 == output directory/title info Output dir. : /homes/reham/ATAC-Seq/atac_dnase_pipelines/out Title (prefix) : atac_dnase_pipelines Reading parameters from section (default) in file(/homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env)... == species settings Species : mm10 Species file : Species name (WashU browser) : mm10 Ref. genome seq. fasta : Chr. sizes file : ../genomes/mm10/mm10.chrom.sizes Black list bed : Ref. genome seq. dir. : == ENCODE accession settings ENCODE experiment accession : ENCODE award RFA : ENCODE assay category : ENCODE assay title : ENCODE award : ENCODE lab : ENCODE assembly genome : ENCODE alias prefix : KLAB_PIPELINE ENCODE alias suffix : == report settings URL root for output directory : Genome coord. for browser tracks : == align multimapping settings # alignments reported for multimapping : 0 == align bowtie2 settings Bowtie2 index : Replacement --score-min for bowtie2 : Walltime (bowtie2) : 47h Max. memory (bowtie2) : 12G Extra param. (bowtie2) : Disable index on memory (bowtie2) : false == adapter trimmer settings Maximum allowed error rate for cutadapt : 0.10 Minimum trim. length for cutadapt -m : 5 Walltime (adapter trimming) : 23h Max. memory (adapter trimming) : 12G == postalign bam settings MAPQ reads rm thresh. : 30 Rm. tag reads with str. : chrM No dupe removal in filtering raw bam : false Walltime (bam filter) : 23h Max. memory (bam filter) : 12G Dup marker : picard Use sambamba markdup (instead of picard) : false == postalign bed/tagalign settings Max. memory for UNIX shuf : 12G No --random-source for UNIX shuf : false == postalign cross-corr. analysis settings Max. memory for UNIX shuf : 12G User-defined cross-corr. peak strandshift : -1 Extra parameters for cross-corr. analysis : Max. memory for cross-corr. analysis : 15G Stack size for cross-corr. analysis : == callpeak macs2 settings Genome size (hs,mm) : mm Walltime (macs2) : 23h Max. memory (macs2) : 15G Cap number of peaks (macs2) : 300K Extra parameters for macs2 callpeak : == callpeak naiver overlap settings Bedtools intersect -nonamecheck : false == IDR settings Append IDR threshold to IDR out_dir : false == ATAQC settings TSS enrichment bed : DNase bed for ataqc : Promoter bed for ataqc : Enhancer bed for ataqc : Reg2map for ataqc : Reg2map_bed for ataqc : Roadmap metadata for ataqc : Max. memory for ATAQC : 20G Walltime for ATAQC : 47h == atac pipeline settings Type of pipeline : atac-seq Align only : false # reads to subsample replicates (0 if no subsampling) : 0 # reads to subsample for cross-corr. analysis : 25000000 No pseudo replicates : false No ATAQC (advanced QC report) : false No Cross-corr. analysis : false Use CSEM for alignment : false Smoothing window for MACS2 : 150 DNase Seq : false IDR threshold : 0.1 Force to use ENCODE3 parameter set : false Force to use ENCODE parameter set : false Disable genome browser tracks : false p-val thresh. for overlapped peaks : 0.01 MACS2 p-val thresh. for peaks : 0.01 MACS2 p-val thresh. for BIGWIGs : 0.01 Enable IDR on called peaks : false Automatically find/trim adapters : false == checking atac parameters ... 00:00:00.925 Error (modules/align_bowtie2.bds, line 44, pos 3): Bowtie2 index (-bwt2_idx) doesn't exists! (file: .1.bt2 or .1.bt2l) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#126 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIOd_GZ5SjTMj8HWBqaLWCUd0eu31tovks5uCjq6gaJpZM4U3hfe> .

rehamFatima · 2018-07-02T16:04:20Z

Thanks Jin, should I send you the output for these ?

…

On 2018-07-02 16:28, Jin Lee wrote: Please do the following for more info: ``` $ cat /homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env $ ls -l /homes/reham/ATAC-Seq/genomes/mm10 $ ls -l /homes/reham/ATAC-Seq/genomes ``` Jin ``` On Mon, Jul 2, 2018 at 8:22 AM, rehamFatima ***@***.***> wrote: > The command is e.g : > > bds atac.bds -species mm10 -se -gensz mm -fastq1 ../genomes/mm10_no_alt_analysis_set_ENCODE.fastq.gz > -chrsz ../genomes/mm10/mm10.chrom.sizes > > and the output is : > > Picked up _JAVA_OPTIONS: -Xms8G -Xmx8G -XX:ParallelGCThreads=1 > > > == git info > Latest git commit : d0babc6 (Tue Jun 5 18:40:20 2018) > Reading parameters from section (default) in file(/homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env)... > > > == configuration file info > Hostname : ebi5-164.ebi.ac.uk > Configuration file : > Environment file : /homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env > > > == parallelization info > No parallel jobs : false > Maximum # threads : 8 > > > == cluster/system info > Walltime (general) : 5h50m > Max. memory (general) : 7G > Force to use a system : local > Process priority (niceness) : 0 > Retiral for failed tasks : 0 > Submit tasks to a cluster queue : > Unlimited cluster mem./walltime : false > Use --acount instead of SLURM partition : false > Java temporary directory : ${TMPDIR} > > > == shell environment info > Conda env. : bds_atac > Conda env. for python3 : bds_atac_py3 > Conda bin. directory : > > Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s) > > Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s) > > Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0 > > Cluster task min. len. : 60 > > Cluster task delay : 0 > > > == output directory/title info > Output dir. : /homes/reham/ATAC-Seq/atac_dnase_pipelines/out > Title (prefix) : atac_dnase_pipelines > Reading parameters from section (default) in file(/homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env)... > > > == species settings > Species : mm10 > Species file : > > Species name (WashU browser) : mm10 > Ref. genome seq. fasta : > Chr. sizes file : ../genomes/mm10/mm10.chrom.sizes > Black list bed : > Ref. genome seq. dir. : > > > == ENCODE accession settings > ENCODE experiment accession : > ENCODE award RFA : > ENCODE assay category : > ENCODE assay title : > ENCODE award : > ENCODE lab : > ENCODE assembly genome : > ENCODE alias prefix : KLAB_PIPELINE > ENCODE alias suffix : > > > == report settings > URL root for output directory : > Genome coord. for browser tracks : > > > == align multimapping settings > # alignments reported for multimapping : 0 > > > == align bowtie2 settings > Bowtie2 index : > Replacement --score-min for bowtie2 : > Walltime (bowtie2) : 47h > Max. memory (bowtie2) : 12G > Extra param. (bowtie2) : > Disable index on memory (bowtie2) : false > > > == adapter trimmer settings > Maximum allowed error rate for cutadapt : 0.10 > Minimum trim. length for cutadapt -m : 5 > Walltime (adapter trimming) : 23h > Max. memory (adapter trimming) : 12G > > > == postalign bam settings > MAPQ reads rm thresh. : 30 > Rm. tag reads with str. : chrM > No dupe removal in filtering raw bam : false > Walltime (bam filter) : 23h > Max. memory (bam filter) : 12G > Dup marker : picard > Use sambamba markdup (instead of picard) : false > > > == postalign bed/tagalign settings > Max. memory for UNIX shuf : 12G > No --random-source for UNIX shuf : false > > > == postalign cross-corr. analysis settings > Max. memory for UNIX shuf : 12G > User-defined cross-corr. peak strandshift : -1 > Extra parameters for cross-corr. analysis : > Max. memory for cross-corr. analysis : 15G > Stack size for cross-corr. analysis : > > > == callpeak macs2 settings > Genome size (hs,mm) : mm > Walltime (macs2) : 23h > Max. memory (macs2) : 15G > Cap number of peaks (macs2) : 300K > Extra parameters for macs2 callpeak : > > > == callpeak naiver overlap settings > Bedtools intersect -nonamecheck : false > > > == IDR settings > Append IDR threshold to IDR out_dir : false > > > == ATAQC settings > TSS enrichment bed : > DNase bed for ataqc : > Promoter bed for ataqc : > Enhancer bed for ataqc : > Reg2map for ataqc : > Reg2map_bed for ataqc : > Roadmap metadata for ataqc : > Max. memory for ATAQC : 20G > Walltime for ATAQC : 47h > > > == atac pipeline settings > Type of pipeline : atac-seq > Align only : false > # reads to subsample replicates (0 if no subsampling) : 0 > # reads to subsample for cross-corr. analysis : 25000000 > No pseudo replicates : false > No ATAQC (advanced QC report) : false > No Cross-corr. analysis : false > Use CSEM for alignment : false > Smoothing window for MACS2 : 150 > DNase Seq : false > IDR threshold : 0.1 > Force to use ENCODE3 parameter set : false > Force to use ENCODE parameter set : false > Disable genome browser tracks : false > p-val thresh. for overlapped peaks : 0.01 > MACS2 p-val thresh. for peaks : 0.01 > MACS2 p-val thresh. for BIGWIGs : 0.01 > Enable IDR on called peaks : false > Automatically find/trim adapters : false > > == checking atac parameters ... > 00:00:00.925 Error (modules/align_bowtie2.bds, line 44, pos 3): Bowtie2 index (-bwt2_idx) doesn't exists! (file: .1.bt2 or .1.bt2l) > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#126 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AIOd_GZ5SjTMj8HWBqaLWCUd0eu31tovks5uCjq6gaJpZM4U3hfe> > . > -- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [1], or mute the thread [2]. Links: ------ [1] #126 (comment) [2] https://github.com/notifications/unsubscribe-auth/Aas3Z7KbU-6dm79PMiLj4ZZhu-P7kzifks5uCjwhgaJpZM4U3hfe

leepc12 · 2018-07-02T16:53:49Z

Please just post them here On Mon, Jul 2, 2018 at 9:04 AM, rehamFatima <notifications@github.com> wrote:

…

Thanks Jin, should I send you the output for these ? On 2018-07-02 16:28, Jin Lee wrote: > Please do the following for more info: > ``` > $ cat /homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env > $ ls -l /homes/reham/ATAC-Seq/genomes/mm10 > $ ls -l /homes/reham/ATAC-Seq/genomes > ``` > > Jin > ``` > > On Mon, Jul 2, 2018 at 8:22 AM, rehamFatima ***@***.***> > wrote: > >> The command is e.g : >> >> bds atac.bds -species mm10 -se -gensz mm -fastq1 > ../genomes/mm10_no_alt_analysis_set_ENCODE.fastq.gz >> -chrsz ../genomes/mm10/mm10.chrom.sizes >> >> and the output is : >> >> Picked up _JAVA_OPTIONS: -Xms8G -Xmx8G -XX:ParallelGCThreads=1 >> >> >> == git info >> Latest git commit : d0babc6 (Tue > Jun 5 18:40:20 2018) >> Reading parameters from section (default) in > file(/homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env)... >> >> >> == configuration file info >> Hostname : ebi5-164.ebi.ac.uk >> Configuration file : >> Environment file : > /homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env >> >> >> == parallelization info >> No parallel jobs : false >> Maximum # threads : 8 >> >> >> == cluster/system info >> Walltime (general) : 5h50m >> Max. memory (general) : 7G >> Force to use a system : local >> Process priority (niceness) : 0 >> Retiral for failed tasks : 0 >> Submit tasks to a cluster queue : >> Unlimited cluster mem./walltime : false >> Use --acount instead of SLURM partition : false >> Java temporary directory : ${TMPDIR} >> >> >> == shell environment info >> Conda env. : bds_atac >> Conda env. for python3 : bds_atac_py3 >> Conda bin. directory : >> >> Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | > grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep > 5; fi; export > PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/ reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/ reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/ usr/bin:/usr/local/bin:${HOME}/.bds; > set -o pipefail; STARTTIME=$(date +%s) >> >> Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env > list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate > bds_atac_py3; sleep 5; fi; export > PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/ reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/ reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/ usr/bin:/usr/local/bin:${HOME}/.bds; > set -o pipefail; STARTTIME=$(date +%s) >> >> Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo > "Task has finished (${TASKTIME} seconds)."; sleep 0 >> >> Cluster task min. len. : 60 >> >> Cluster task delay : 0 >> >> >> == output directory/title info >> Output dir. : /homes/reham/ATAC-Seq/atac_dnase_pipelines/out >> Title (prefix) : atac_dnase_pipelines >> Reading parameters from section (default) in > file(/homes/reham/ATAC-Seq/atac_dnase_pipelines/default.env)... >> >> >> == species settings >> Species : mm10 >> Species file : >> >> Species name (WashU browser) : mm10 >> Ref. genome seq. fasta : >> Chr. sizes file : ../genomes/mm10/mm10.chrom.sizes >> Black list bed : >> Ref. genome seq. dir. : >> >> >> == ENCODE accession settings >> ENCODE experiment accession : >> ENCODE award RFA : >> ENCODE assay category : >> ENCODE assay title : >> ENCODE award : >> ENCODE lab : >> ENCODE assembly genome : >> ENCODE alias prefix : KLAB_PIPELINE >> ENCODE alias suffix : >> >> >> == report settings >> URL root for output directory : >> Genome coord. for browser tracks : >> >> >> == align multimapping settings >> # alignments reported for multimapping : 0 >> >> >> == align bowtie2 settings >> Bowtie2 index : >> Replacement --score-min for bowtie2 : >> Walltime (bowtie2) : 47h >> Max. memory (bowtie2) : 12G >> Extra param. (bowtie2) : >> Disable index on memory (bowtie2) : false >> >> >> == adapter trimmer settings >> Maximum allowed error rate for cutadapt : 0.10 >> Minimum trim. length for cutadapt -m : 5 >> Walltime (adapter trimming) : 23h >> Max. memory (adapter trimming) : 12G >> >> >> == postalign bam settings >> MAPQ reads rm thresh. : 30 >> Rm. tag reads with str. : chrM >> No dupe removal in filtering raw bam : false >> Walltime (bam filter) : 23h >> Max. memory (bam filter) : 12G >> Dup marker : picard >> Use sambamba markdup (instead of picard) : false >> >> >> == postalign bed/tagalign settings >> Max. memory for UNIX shuf : 12G >> No --random-source for UNIX shuf : false >> >> >> == postalign cross-corr. analysis settings >> Max. memory for UNIX shuf : 12G >> User-defined cross-corr. peak strandshift : -1 >> Extra parameters for cross-corr. analysis : >> Max. memory for cross-corr. analysis : 15G >> Stack size for cross-corr. analysis : >> >> >> == callpeak macs2 settings >> Genome size (hs,mm) : mm >> Walltime (macs2) : 23h >> Max. memory (macs2) : 15G >> Cap number of peaks (macs2) : 300K >> Extra parameters for macs2 callpeak : >> >> >> == callpeak naiver overlap settings >> Bedtools intersect -nonamecheck : false >> >> >> == IDR settings >> Append IDR threshold to IDR out_dir : false >> >> >> == ATAQC settings >> TSS enrichment bed : >> DNase bed for ataqc : >> Promoter bed for ataqc : >> Enhancer bed for ataqc : >> Reg2map for ataqc : >> Reg2map_bed for ataqc : >> Roadmap metadata for ataqc : >> Max. memory for ATAQC : 20G >> Walltime for ATAQC : 47h >> >> >> == atac pipeline settings >> Type of pipeline : atac-seq >> Align only : false >> # reads to subsample replicates (0 if no subsampling) : 0 >> # reads to subsample for cross-corr. analysis : 25000000 >> No pseudo replicates : false >> No ATAQC (advanced QC report) : false >> No Cross-corr. analysis : false >> Use CSEM for alignment : false >> Smoothing window for MACS2 : 150 >> DNase Seq : false >> IDR threshold : 0.1 >> Force to use ENCODE3 parameter set : false >> Force to use ENCODE parameter set : false >> Disable genome browser tracks : false >> p-val thresh. for overlapped peaks : 0.01 >> MACS2 p-val thresh. for peaks : 0.01 >> MACS2 p-val thresh. for BIGWIGs : 0.01 >> Enable IDR on called peaks : false >> Automatically find/trim adapters : false >> >> == checking atac parameters ... >> 00:00:00.925 Error (modules/align_bowtie2.bds, line 44, pos 3): > Bowtie2 index (-bwt2_idx) doesn't exists! (file: .1.bt2 or .1.bt2l) >> >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> > <#126# issuecomment-401841414>, >> or mute the thread >> > <https://github.com/notifications/unsubscribe-auth/AIOd_ GZ5SjTMj8HWBqaLWCUd0eu31tovks5uCjq6gaJpZM4U3hfe> >> . >> > -- > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub [1], or mute the > thread [2]. > > Links: > ------ > [1] > https://github.com/kundajelab/atac_dnase_pipelines/issues/ 126#issuecomment-401843182 > [2] > https://github.com/notifications/unsubscribe-auth/Aas3Z7KbU- 6dm79PMiLj4ZZhu-P7kzifks5uCjwhgaJpZM4U3hfe — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#126 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIOd_EHy48O8DwLF7zeTrc2FUT6DHS0nks5uCkSGgaJpZM4U3hfe> .

rehamFatima · 2018-07-04T15:52:00Z



## Get hostname with the following command: 
## $ hostname -f
##
## Configure environment per hostname:
## [hostname1]
## ...
##
## Use the same environment for multiple hostnames:
## [hostname2, hostname3, ...]
## ...
##
## Using group
## [hostname1, hostname2, ... : group]
## [group]
## ...
##
## Using an asterisk in hostnames (IMPORTANT: only one * is allowed in hostnames)
##
## [host*name1]
##
## [*hostname2, hostname3*]

# Stanford Kundaje group clusters (out of SGE)
[vayu, mitra, durga]
conda_env	= bds_atac
conda_env_py3	= bds_atac_py3
conda_bin_dir	= /software/miniconda3/bin
species_file	= $script_dir/species/kundaje.conf
unlimited_mem_wt= true 		# unlimited max. memory and walltime on Kundaje clusters
nice 		= 20
nth 		= 3

# Stanford Kundaje group clusters (controlled with SGE)
[nandi, kali, amold, wotan, kadru, surya, indra]
conda_env	= bds_atac
conda_env_py3	= bds_atac_py3
conda_bin_dir	= /software/miniconda3/bin
species_file	= $script_dir/species/kundaje.conf
unlimited_mem_wt= true 		# unlimited max. memory and walltime on Kundaje clusters
system 		= sge
nice 		= 20
nth 		= 3

# Stanford NEW SCG
[*.scg.stanford.edu, dper730xd*, hppsl230s*, dper910*, sgiuv*, sgisummit*, smsx10srw*]
conda_env	= bds_atac
conda_env_py3	= bds_atac_py3
species_file	= $script_dir/species/scg.conf
nth		= 4		# number of threads for each pipeline
system		= slurm		# force to use SLURM SCG
q_for_slurm_account = true	# use --account instead of -p (partition)
cluster_task_delay = 10 	# for NFS delayed write

# Stanford OLD SCG4
[scg*.stanford.edu, scg*.local, carmack.stanford.edu, crick.stanford.edu]
conda_env	= bds_atac
conda_env_py3	= bds_atac_py3
species_file	= $script_dir/species/scg.conf
nth		= 4		# number of threads for each pipeline
system		= sge		# force to use SGE (Sun Grid Engine) on SCG3/4 even though a user doesn't explicitly specify SGE on command line with 'bds -s sge atac.bds ...'
cluster_task_delay = 10 	# for NFS delayed write

# Stanford Sherlock clusters
[sherlock*.stanford.edu, sh-*.local, sh-*.int, sh-ln*.stanford.edu]
conda_env	= bds_atac
conda_env_py3	= bds_atac_py3
species_file	= $script_dir/species/sherlock.conf
nth		= 4		# number of threads for each pipeline
system		= slurm
cluster_task_delay = 30		# for NFS delayed write


# default
[default]
conda_env	= bds_atac
conda_env_py3	= bds_atac_py3
species_file	= /nfs/nobackup/ensembl/reham_ens/genome/bds_atac_species.conf	# DEF_SPECIES_FILE: DO NOT MODIFY THIS COMMENT (install_genome_data.sh WILL NOT WORK).```

rehamFatima · 2018-07-04T15:54:25Z

$ ls -l /nfs/nobackup/ensembl/reham_ens/genome/mm10/

total 4327335
drwxr-sr-x 2 reham ensembl        347 Jun 29 17:32 ataqc
drwxr-sr-x 2 reham ensembl        429 Jul  2 10:01 bowtie2_index
-rw-r--r-- 1 reham ensembl       1263 Oct 22  2016 mm10.blacklist.bed.gz
-rw-r--r-- 1 reham ensembl       1405 Jul  2 10:01 mm10.chrom.sizes
-rw-r--r-- 1 reham ensembl   26078281 Jul  2 14:22 mm10_enh_dhs.bam
-rw-r--r-- 1 reham ensembl 2785490220 Jul  2 09:46 mm10_no_alt_analysis_set_ENCODE.fasta
-rw-r--r-- 1 reham ensembl       2511 Jul  2 10:01 mm10_no_alt_analysis_set_ENCODE.fasta.fai
-rw-r--r-- 1 reham ensembl  870379095 Jan 23  2016 mm10_no_alt_analysis_set_ENCODE.fasta.gz
drwxr-sr-x 2 reham ensembl       2544 Jul  2 09:46 seq

( I have had to reinstall the genome data to another location )

rehamFatima · 2018-07-04T15:57:09Z

$ ls -l /nfs/nobackup/ensembl/reham_ens/genome
total 62103
-rw-r--r-- 1 reham ensembl     1124 Jul  2 10:01 bds_atac_species.conf
-rwxr--r-- 1 reham ensembl 26286699 Jun 29 17:45 ERR1659027_1.fastq
-rwxr--r-- 1 reham ensembl 26286699 Jun 29 17:45 ERR1659027_2.fastq
drwxr-sr-x 5 reham ensembl      410 Jul  2 14:22 mm10

and

ls -l /homes/reham/ATAC-Seq/genomes/
total 1901192
-rwxr--r-- 1 reham ensembl 1884648321 Jun 14 14:52 ENCFF124LBK.fastq.gz
drwxr-xr-x 3 reham ensembl       4096 Jun 29 17:32 mm10
-rw-r--r-- 1 reham ensembl   26079643 Jun 14 10:59 mm10_enh_dhs.bam
-rw-r--r-- 1 reham ensembl    2342848 Jun 29 16:30 mm10_enh_dhs.bam.bai
-rw-r--r-- 1 reham ensembl   26079835 Jun 29 16:29 mm10_enh_dhs.bam.sorted

I hope these will help.
Many thanks

leepc12 · 2018-07-04T19:40:16Z

Please add -species_file /nfs/nobackup/ensembl/reham_ens/genome/bds_atac_species.conf to the command line and try again.

rehamFatima · 2018-07-06T08:41:54Z

Command :

bds atac.bds -species mm10 -species_file /nfs/../genome/bds_atac_species.conf -gensz mm -nth 1 -bam1 /nfs/../genome/mm10/mm10_enh_dhs.bam -system local -no_par

Output :

== checking input files ...

Rep1 bam (PE) :
	/nfs/../genome/mm10/mm10_enh_dhs.bam
Distributing 1 to ... 
{1=1}
awk: cmd. line:1: fatal: division by zero attempted
Task failed:
	Program & line     : 'modules/postalign_bam.bds', line 341
	Task Name          : 'dedup_bam_PE_2 rep1'
	Task ID            : 'atac.bds.20180706_093826_930/task.postalign_bam.dedup_bam_PE_2_rep1.line_341.id_10'
	Task PID           : '1925'
	Task hint          : 'samtools view -F 1804 -f 2 -b /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam > /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam; sambamba index -t 1 /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam; s'
	Task resources     : 'cpus: -1	mem: -1.0 B	wall-timeout: 8640000'
	State              : 'ERROR'
	Dependency state   : 'ERROR'
	Retries available  : '1'
	Input files        : '[/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam]'
	Output files       : '[/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam, /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.nodup.flagstat.qc, /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.nodup.pbc.qc]'
	Script file        : '/homes/reham/ATAC-Seq/atac_dnase_pipelines/atac.bds.20180706_093826_930/task.postalign_bam.dedup_bam_PE_2_rep1.line_341.id_10.sh'
	Exit status        : '1'
	Program            : 
		
		# SYS command. line 343
		
		 if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi;  export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
		
		# SYS command. line 350
		
		 samtools view -F 1804 -f 2 -b /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam > /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam
		
		# SYS command. line 352
		
		 sambamba index -t 1 /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam
		
		# SYS command. line 354
		
		 sambamba flagstat -t 1 /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam > /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.nodup.flagstat.qc
		
		# SYS command. line 365
		
		 sambamba sort -t 1 -n /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam -o /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.tmp.bam
		
		# SYS command. line 367
		
		 bedtools bamtobed -bedpe -i /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.tmp.bam | \
							awk 'BEGIN{OFS="\t"}{print $1,$2,$4,$6,$9,$10}' | \
							grep -v 'chrM' | sort | uniq -c | \
							awk 'BEGIN{mt=0;m0=0;m1=0;m2=0} ($1==1){m1=m1+1} ($1==2){m2=m2+1} {m0=m0+1} {mt=mt+$1} END{m1_m2=-1.0; if(m2>0) m1_m2=m1/m2; printf "%d\t%d\t%d\t%d\t%f\t%f\t%f\n",mt,m0,m1,m2,m0/mt,m1/m0,m1_m2}' > /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.nodup.pbc.qc
		
		# SYS command. line 371
		
		 rm -f /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.tmp.bam
		
		# SYS command. line 373
		
		 TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
	StdErr (100000000 lines)  :
		awk: cmd. line:1: fatal: division by zero attempted

Fatal error: modules/postalign_bam.bds, line 378, pos 4. Task/s failed.

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.

leepc12 · 2018-07-09T19:13:20Z

Can you share your BAM files for debugging?

original:
/nfs/nobackup/ensembl/reham_ens/genome/mm10/mm10_enh_dhs.bam
processed:
/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.tmp.bam

Is your input BAM file filtered/deduped? any quality problem with the input BAM file?

rehamFatima · 2018-07-10T11:20:12Z

PFA

BamFiles.zip

I just used the bed files within the mm10 genome installed and generated the bam file using bedtools bedtobam. Could you tell how or what should I be looking for in terms of quality problem with the bam file?
Thank you

leepc12 · 2018-07-10T17:02:55Z

That BED file in the mm10 genome data is not for testing purpose.

If you just wanted to test our pipeline. I recommend to use a new pipeline https://github.com/ENCODE-DCC/atac-seq-pipeline.

rehamFatima · 2018-07-11T08:47:29Z

Thanks Jin, if you could please tell me what data set you used to generate the files on your main page. As I have tried even ATAC seq data sets from ENA.

leepc12 · 2018-07-11T21:44:29Z

We recommend to use our new pipeline at https://github.com/ENCODE-DCC/atac-seq-pipeline/. This new pipeline supports docker, DNANexus and Google Cloud platform so that you will avoid any dependency problems.

Please read through documentation and then find an example at /examples/scg, download data from the ENCODE portal and try running pipelines.

rehamFatima · 2018-07-16T13:14:42Z

Thank you Jin. May I ask how the atac-seq uses phantompeakqualtools ? as it is seemingly tripping there.
Thanks

leepc12 · 2018-07-16T18:59:04Z

@rehamFatima: Sure, run_spp.R is used for cross-correlation analysis. Actual shell command lines are here https://github.com/kundajelab/atac_dnase_pipelines/blob/master/modules/postalign_xcor.bds#L113

What kind of error did you get?

leepc12 · 2018-08-16T19:32:42Z

@rehamFatima I actually don't understand why you want to test our pipeline with a BED file (or its BAM conversion) which is not designed to be used as a valid input to the pipeline.

The error you got in the issue #128 is input data quality problem. Please try with some samples on the ENCODE portal.

Instead of using this old pipeline, can you work with our new pipeline?

A very detailed tutorial with sample data is available at https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/hotfix_PIP-357/docs/tutorial_google.md

You may need to sign up for Google Cloud Platform.

leepc12 mentioned this issue Jul 10, 2018

Error : Creating checkpoint file #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data set #126

Data set #126

rehamFatima commented Jun 26, 2018

leepc12 commented Jun 26, 2018

rehamFatima commented Jun 27, 2018

rehamFatima commented Jun 29, 2018 •

edited

rehamFatima commented Jun 29, 2018

leepc12 commented Jun 29, 2018

rehamFatima commented Jul 2, 2018

leepc12 commented Jul 2, 2018

rehamFatima commented Jul 2, 2018

leepc12 commented Jul 2, 2018 via email

rehamFatima commented Jul 2, 2018 via email

leepc12 commented Jul 2, 2018 via email

rehamFatima commented Jul 4, 2018 •

edited

rehamFatima commented Jul 4, 2018 •

edited

rehamFatima commented Jul 4, 2018

leepc12 commented Jul 4, 2018

rehamFatima commented Jul 6, 2018 •

edited

leepc12 commented Jul 9, 2018

rehamFatima commented Jul 10, 2018

leepc12 commented Jul 10, 2018

rehamFatima commented Jul 11, 2018

leepc12 commented Jul 11, 2018

rehamFatima commented Jul 16, 2018 •

edited

leepc12 commented Jul 16, 2018

leepc12 commented Aug 16, 2018

Data set #126

Data set #126

Comments

rehamFatima commented Jun 26, 2018

leepc12 commented Jun 26, 2018

rehamFatima commented Jun 27, 2018

rehamFatima commented Jun 29, 2018 • edited

rehamFatima commented Jun 29, 2018

leepc12 commented Jun 29, 2018

rehamFatima commented Jul 2, 2018

leepc12 commented Jul 2, 2018

rehamFatima commented Jul 2, 2018

leepc12 commented Jul 2, 2018 via email

rehamFatima commented Jul 2, 2018 via email

leepc12 commented Jul 2, 2018 via email

rehamFatima commented Jul 4, 2018 • edited

rehamFatima commented Jul 4, 2018 • edited

rehamFatima commented Jul 4, 2018

leepc12 commented Jul 4, 2018

rehamFatima commented Jul 6, 2018 • edited

leepc12 commented Jul 9, 2018

rehamFatima commented Jul 10, 2018

leepc12 commented Jul 10, 2018

rehamFatima commented Jul 11, 2018

leepc12 commented Jul 11, 2018

rehamFatima commented Jul 16, 2018 • edited

leepc12 commented Jul 16, 2018

leepc12 commented Aug 16, 2018

rehamFatima commented Jun 29, 2018 •

edited

rehamFatima commented Jul 4, 2018 •

edited

rehamFatima commented Jul 4, 2018 •

edited

rehamFatima commented Jul 6, 2018 •

edited

rehamFatima commented Jul 16, 2018 •

edited