-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Resilience to Empty FASTQ Files with Logging Functionality #4842
Enhance Resilience to Empty FASTQ Files with Logging Functionality #4842
Conversation
cdcc03b
to
df65c65
Compare
7649d1b
to
19f90d2
Compare
Hi @matthdsm, one important note, to pass the tests, I have commented out the pytest_modules.yml for subworkflows/bcl_demultiplex, it seems the md5sum's dont match with the ones expected. This PR is related to this one in nf-core/demultiplex:dev how can we proceed? your advice will be very much appreciated. Thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the tests failing, I tried it from master
and they fail too, so it must be unrelated to your PR 🤷 That being said, we are migrating everything from pytest to nf-test, so it would be much appreciated if you could update the tests in this PR. You can find the instructions here 👉 https://nf-co.re/docs/contributing/subworkflows#migrating-from-pytest-to-nf-test. This will also update the md5sums in the process.
Also, if you want the logfile to be available to the outside and other modules downstream I think it should be part of an output channel (although I'm not entirely sure about this, you might want to get a second opinion).
@Aratz, this good to go you think? |
I'm still wondering if Also, maybe I was a bit too quick to fix the tests, I just noticed the snap file was basically empty, so this should be addressed too. |
Co-authored-by: Matthias De Smet <11850640+matthdsm@users.noreply.github.com>
3221e14
to
3a6504e
Compare
…o test different container engines or defaults to Docker
Do you need any help to get these changes merged in? |
Yes, I could definelty use a hand, I am struggling with the nf-tests. |
Ok, can help you this week. |
@glichtenstein , should be good to merge now. |
@glichtenstein , I don't seem to be able to edit
Adding before the current line 644, to be able to exclude the conda test. |
…est. - profile: conda; path: subworkflows/nf-core/bcl_demultiplex
Oh, dont know why you cant edit, but I added those lines now. |
I couldn't edit it because it changes the github actions that are run, so github sensibly didn't want some random person to make changes to that on your repository. |
Awesome, Thanks for everything!!!! |
@glichtenstein Thanks for all the work you have done on this! I filed an issue related to the use of
However, I think that this sub workflow should not make decisions on which files are valid and invalid. In fact, an empty FASTQ is a valid output of Illumina's bclconvert and bcl2fastq. Instead, this sub workflow should emit all the files that are created (even if they are empty). This is useful in situations where a downstream user wants to count the # of reads in the FASTQ files to know that a certain barcode/index received zero counts, instead of inferring that is the case. If it is necessary to filter the output, the sub workflow should have emits for empty FASTQ files as @Aratz mentioned. Something like |
@k1sauce Thank you! Those are indeed very good and fair points. Basically, the issue arised because falco will return an error and exit when encountered with an empty fastq stating it was invalid and the entire workflow then exits. So even though BCLConvert has finished the workflow wont continue with the subsequent tasks, like Falco. So the idea is mainly to skip over those empty fastqs so that the workflow can succeed and reach the end so that one can evaluate the cause of empty fastqs, l.e., missed/wrong barcodes in SampleSheet.csv. The output file is intended as a log file to keep track of which files where empty or dimmed invalid by falco during the processing. I would very much like to enhance this and make it optional, so it may not disrupt your work, maybe with a flag like |
@glichtenstein Thanks for the background info. After thinking it over, I think the best thing to do is to include a boolean flag in the meta map of the FASTQ output channel. That way the user can decide how to handle those empty FASTQ files. |
Okie, I will try to do both, use nextflow's file() and add a boolean flag, will create a new branch. |
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).Title of PR:
"Enhanced Resilience to Empty or Invalid FASTQ Files with Logging Functionality"
Description:
Changes Made:
Introduction of a Log File:
${params.outdir}/invalid_fastqs.log
) for tracking empty or invalid FASTQ files, enhancing post-run analysis.Modified
generate_fastq_meta
Function:generate_fastq_meta
to gracefully handle empty or invalid FASTQ files, preventing pipeline interruption.logFile
as a new parameter for logging these occurrences.New Function -
appendToLogFile
:appendToLogFile
, a utility function for appending messages to the log file. This function improves maintainability and includes a check to handle Groovy GStrings.Reason for Changes:
appendToLogFile
and the parameterization oflogFile
align with best coding practices, enhancing code reusability and organization.Impact:
Use Cases Addressed:
Incorrect Barcode in Sample Sheet:
Library Preparation Errors:
invalid_fastqs.log
file will assist in pinpointing these samples for reprocessing or further laboratory investigation.