Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with Corrrecting UB tags when working with large sample size #388

Open
kvn95ss opened this issue Jan 28, 2024 · 0 comments
Open

Error with Corrrecting UB tags when working with large sample size #388

kvn95ss opened this issue Jan 28, 2024 · 0 comments

Comments

@kvn95ss
Copy link

kvn95ss commented Jan 28, 2024

Describe the bug
Getting the following error -

Correcting UB tags...
[1] "5.4e+08 Reads per chunk"
[1] "2024-01-28 15:27:24 CET"
[1] "Here are the detected subsampling options:"
[1] "Automatic downsampling"
[1] "Working on barcode chunk 1 out of 2"
[1] "Processing 403 barcodes in this chunk..."
[1] "Working on barcode chunk 2 out of 2"
[1] "Processing 265 barcodes in this chunk..."
Error in alldt[[i]][[1]] <- rbind(alldt[[i]][[1]], newdt[[i]][[1]]) :
  more elements supplied than there are to replace
Calls: bindList
In addition: Warning messages:
1: In parallel::mclapply(mapList, function(tt) { :
  all scheduled cores encountered errors in user code
2: In parallel::mclapply(mapList, function(tt) { :
  all scheduled cores encountered errors in user code
Execution halted

Running this in Rackham, with single-end reads generated from SmartSeq3.

Some context - I used merge_demultiplexed_fastq.R to combine our ~600 samples, resulting in an R1.fastq.gz file of 30 GB and index of 5 GB. I modified the STAR alignment code to work with 1 instance with 20 threads.

The generated filtered.Aligned.GeneTagged.sorted.bam had a few reads with negative position, hence I removed those reads from the BAM file and indexed them. It then proceeded until I got the above error. I performed a test with small number of samples and was able to generate the full output.

For now, I am planning to split the input files into chunk and process them in batches of ~300 samples each, then merging the generated count table. Is that a viable option, or is it better to process the entire data together?
cohort.yaml.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant