Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bismark issues 0.24.2 #652

Open
NeGomics opened this issue Feb 13, 2024 · 7 comments
Open

Bismark issues 0.24.2 #652

NeGomics opened this issue Feb 13, 2024 · 7 comments

Comments

@NeGomics
Copy link

I am executing the bismark on paired end data using the command: bismark --multicore 12 path/genome_dir --output_dir path/alignment -1 <(gunzip -c "filename_R1.fq.gz") -2 <(gunzip -c "filename_R2.fq.gz")

  1. Error: Child process terminated with exit signal: '65280'

Child process terminated with exit signal: '65280'

Child process terminated with exit signal: '65280'

Child process terminated with exit signal: '3072'

Child process terminated with exit signal: '3072'

Child process terminated with exit signal: '3072'

Child process terminated with exit signal: '3072'

Terminating. Not all child processes successfully finished. at /home/cloud-user/anaconda3/bin/bismark line 602, line 106663392.
2. (ERR): bowtie2-align exited with value 137
Killed
3. Chromosomal sequence could not be extracted for LH00392:4:222GKJLT4:2:1140:2829:28343_1:N:0:AAGGAAGG+ACTCCGGT chrM 16421

@FelixKrueger
Copy link
Owner

Regarding 1: in the first instance I would try to see if Bismark runs fine without --multicore 12 with a sample command:

bismark --genome path/genome_dir --output_dir path/alignment -1 filename_R1.fq.gz -2 filename_R2.fq.gz

(note you don't need to add the gunzip stuff)

As a reminder, for the human genome --multicore 12 would use roughly 36 cores, and > 150GB of RAM. Do you have that much available?

The error message 137 in 2. is often indicative of the OS killing the process due to memory limitations.

and 3. This is a warning message displayed when sequences align to the very edge of chromosomes (typically the MT). This is only required to determine the cytosine context, and is normally fine to ignore.

@NeGomics
Copy link
Author

Thank you Felix. I have 32 core virtual machine on High performance cluster. I have 92 files to map. By default it will take only one core. If I execute in the bash of multiple samples. Then I need to mention core. How many core will it support? How much time it will take for one paired sample?

@FelixKrueger
Copy link
Owner

In the default mode, Bismark will require 3 cores at 100%, and ~ 10-15GB of RAM for default. Any factor of --multiple will multiply these requirements (see --help for more details).

The time it takes will depend on several things, the genome (size), the repeat content, the library type, the read length as well as the parameters used. I suggest taking a subset (e.g. 10 million reads, -u 10000000) to run a test, by first trimming the data with Trim Galore, and then extrapolate from there.

@NeGomics
Copy link
Author

Thank you. I have new human genome GrCH 38 as well as T2T+Y (2023) genome separately. Read length is 150Bp generated by Novaseq illumina.
there is no --multiple , it has option --parallel. Currently it is running for 1 set of paired data for all reads

@FelixKrueger
Copy link
Owner

Not exactly sure what the memory requirements are for the T2T genome, but I doubt it will be lower than GRCh38.

the option --parallel/--multiple are equivalent.

The most important thing for you to do is

@NeGomics
Copy link
Author

Thank you. It is larger than GrCh38. the library is BS-Seq in Novaseq machine in iillumina.

@FelixKrueger
Copy link
Owner

There are at least a dozen different types of BS-seq, it is important to know specific details (e.g. Accel Swift, PBAT, scNMT, WGBS, RRBS...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants