Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interleaved output is not reproducible with multiple threads #552

Open
briankegerreis opened this issue Mar 15, 2024 · 1 comment
Open

Comments

@briankegerreis
Copy link

When writing to interleaved stdout with the --stdout flag, reads appear to be written in random order as md5sums from repeat runs differ.

steps to reproduce with v0.23.3:

for TRY in 1 2 3; do fastp -w $THREADS --dont_eval_duplication -i in1.fq.gz -I in2.fq.gz -A -G -L -Q --stdout > interleave_attempt${TRY}.fastq; done
md5sum interleave*
for TRY in 1 2 3; do fastp -w $THREADS --dont_eval_duplication -i in1.fq.gz -I in2.fq.gz -A -G -L -Q -o split_attempt${TRY}_1.fastq -O split_attempt${TRY}_2.fastq; done
md5sum split*
@ckrushton
Copy link

I have also encountered this, and upon investigation, it appears to be the result of a race condition when writing to stdout with multiple threads. While writing to an output file, it FASTP seems to always be consistent, because a thread that completes faster than others will wait until the others complete before writing to the output. Unfortunately this logic is not applied when writing to stdout.

We ended up using --stdout with a single thread in a our workflow, then modifying the source code directly to allow FASTP to work with named pipes (currently it appears to write the output in a consistent order, but then block and hang after it finishes processing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants