New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elembio bases2fastq #2044
base: main
Are you sure you want to change the base?
Elembio bases2fastq #2044
Conversation
… add-b2f-support-new
Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
… add-b2f-support-new
…ssed through clean_s_name
Set |
On more issue is that when run on all test inputs, it reports that no samples are found:
Probably in this case it should raise an exception to be more explicit about the errors. UPD: one for a separate PR (related to #2047) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
@vladsavelyev - getting the same thing here as we just fixed in bclconvert, where the module exits with no results if there are mixtures of runs: ❯ multiqc -f .
/// MultiQC 🔍 | v1.16.dev0 (de0264c)
| multiqc | Search path : /Users/ewels/GitHub/MultiQC/TestData/data/modules/bases2fastq
| searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 303/303
| bases2fastq | Found 13 total RunStats.json
| bases2fastq | More than one read length configurations are found in the dataset:150+150,75+75
| bases2fastq | These runs have 150+150 read length configuration:DVT-0008__2d47,DVT-0013__04fd,DVT-0015__1fa3
| bases2fastq | These runs have 75+75 read length configuration:DVT-0023__38a4,DVT-0019__85ac,DVT-0018__e1b7,DVT-0020__f923,DVT-0031__0e11,DVT-0030__8a89,DVT-0025__2b41,DVT-0024__17cf,DVT-0021__3f2c,DVT-0022__08c4
| multiqc | No analysis results found. Cleaning up..
| multiqc | MultiQC complete Do you think we can refactor to be able to still generate a report successfully in this case? |
contents_re: ".*NumPolonies.*" | ||
num_lines: 100 | ||
bases2fastq/group: | ||
fn: "*_b2fgroup.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to have any test data for the group file, do we?
https://github.com/ewels/MultiQC_TestData/tree/master/data/modules/bases2fastq
Would it be possible to add some? @blajoie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did one last pass through and added a few changes:
- Worked around the use of pandas which we don't install along with MultiQC,
- Got rid of
UserWarning
. We want to have only oneUserWarning
per module, which indicates that no data is found. Otherwise, a module should ideally do its best to generate a report for whatever data it found. It doesn't seem like it depends on the read lengths to be identical across all samples, so we can get rid of theUserWarning
as well.
What we are missing is some test data for the following:
bases2fastq/project:
fn: "*_RunStats.json"
contents_re: ".*NumPolonies.*"
num_lines: 100
bases2fastq/group:
fn: "*_b2fgroup.csv"
It would be nice to add it, otherwise a fair share of the module code is not tested at all.
@blajoie / @YuheCheng62 - we are close to getting the module merged now on the code side. However, the size of the test data is presenting some fairly significant problems. For starters, it's bloated the MultiQC test data a lot, making all the CI checks run quite a lot slower. It's 141MB, which is 30% of the size of the entire test data repo - for one module out of 130 😓 Also the module runs very slowly. Probably the most comparable MultiQC module is bclconvert, and that runs in @blajoie - we spoke on Slack about potentially being able to generate a different flavour of outputs, which are not huge files containing everything in one place. But instead having a lightweight
Is this a possibility in the near-ish future? I think that the report filesize may be a blocker for merging into MultiQC as it stands.. |
Sorry all - had to pivot to some other high priority work. Hoping to come back to this soon to finish this off! Appreciate your patience on this ! :) |
We don't have permission to push directly to the source fork in #1990, so creating this branch + PR to continue work so that we can push minor changes directly to be able to finish the merge.
Please see #1990 for discussion and context.