New module: Hostile #2501

SumeetTiwari07 · 2024-04-23T16:34:41Z

This comment contains a description of changes (with reason)

There is example tool output for tools in the https://github.com/MultiQC/test-data repository or attached to this PR
Code is tested and works locally (including with --strict flag)
docs/modulename.md is created
Everything that can be represented with a plot instead of a table is a plot
Report sections have a description and help text (with self.add_section)
There aren't any huge tables with > 6 columns (explain reasoning if so)
Each table column has a different colour scale to its neighbour, which relates to the data (e.g. if high numbers are bad, they're red)
Module does not do any significant computational work

SumeetTiwari07

""

multiqc/utils/search_patterns.yaml

multiqc/modules/hostile/hostile.py

vladsavelyev · 2024-04-26T19:13:17Z

Thanks a lot for the contribution!

Left a few comments.

Also, it's needed to add self.ignore_samples (see https://multiqc.info/docs/development/modules/#filtering-by-parsed-sample-names) and self. add_software_version (see https://multiqc.info/docs/development/modules/#saving-version-information) calls.

        ...
        for f in self.find_log_files("hostile", filehandles=True):
            self.parse_logs(f)

        self.add_software_version(...)

        self.parse_data = self.ignore_samples(self.parse_data)

        if len(self.parse_data) == 0:
            raise ModuleNoSamplesFound

        log.info(f"Found {len(self.parse_data)} reports")

        self.write_data_file(self.parse_data, "multiqc_hostile")
        ...

vladsavelyev · 2024-04-26T19:14:24Z

Please also create a PR in https://github.com/MultiQC/test-data with test examples.

SumeetTiwari07 · 2024-04-26T22:32:31Z

Thank you for reviewing the code and all the suggestions.
I made the changes that were recommended and updated the code as suggested. 069e1c6,
51f61da
Also added the test-data (MultiQC/test-data#319)

multiqc/modules/hostile/hostile.py

vladsavelyev · 2024-04-29T17:35:40Z

multiqc/modules/hostile/hostile.py

+        data = {}
+
+        for f_name, values in self.parse_data.items():
+            s_name = values[0]["fastq1_in_name"].split(".")[0]


self.parse_data is already assumed to be indexed by sample names (i.e. self.ignore_samples takes that assumption), so we can't create a different sample name here. Better move this line into parse_logs. And clean with self.clean_s_name() instead of manually calling split (the function knows about the extensions to remove)

for f_name, values in self.parse_data.items(): database = os.path.basename(values[0]["index"]) data[f_name] = {"Cleaned reads": values[0]["reads_out"], "Host reads": values[0]["reads_removed"]}

Removed that line.

vladsavelyev · 2024-04-29T17:37:11Z

multiqc/modules/hostile/hostile.py

+        self.add_section(
+            name="Reads Filtering",
+            anchor="hostile-reads",
+            description=f"This plot shows the number of cleaned reads vs host-reads per sample (database index: {database}).",


database is initialized in the loop for each sample separately and this line is outside of the loop. Need to decide how to handle the situation if samples have different database

Since Hostile does not allow the screening against multiple databases in a single run, a batch of sequences will be screened against the same database. However, modified the code in which the database is joined with the categories.

for f_name, values in self.parse_data.items(): database = os.path.basename(values[0]["index"]) data[f_name] = {f"Cleaned reads (DB: {database})": values[0]["reads_out"], f"Host reads (DB: {database})": values[0]["reads_removed"]} ## categories all_categories = [inner_key for outer_dict in data.values() for inner_key in outer_dict.keys()] cats = list(set(all_categories)) #cats = ["Cleaned reads", "Host reads"]

Does it make sense?

Suggested change

description=f"This plot shows the number of cleaned reads vs host-reads per sample (database index: {database}).",

description=f"This plot shows the number of cleaned reads vs host-reads per sample (DB: Database).",

multiqc/utils/search_patterns.yaml

vladsavelyev · 2024-04-29T17:43:35Z

multiqc/modules/hostile/hostile.py

+            log.warning(f"Could not parse JSON file {json_file['f']}")
+            return
+
+        if len(parse_data) > 0:


Can there be more than one entry in the JSON file? How that should be handled? Please add such example to the test-data repo.

No there won't be more than one entry in the JSON file.
~~if len(parse_data) > 0:~~ deleted from conde.

Suggested change

if len(parse_data) > 0:

vladsavelyev · 2024-04-29T17:56:13Z

multiqc/modules/hostile/hostile.py

+        for f_name, values in self.parse_data.items():
+            s_name = values[0]["fastq1_in_name"].split(".")[0]
+            database = os.path.basename(values[0]["index"])
+            data[s_name] = {"Cleaned reads": values[0]["reads_out"], "Host reads": values[0]["reads_removed"]}


Should reads_out and reads_removed sum up to reads_in? Doesn't look like in the test data :

cat test_data/data/modules/hostile/hostile.SAMPLE-A1.json [ { "version": "1.1.0", "aligner": "bowtie2", "index": "human-t2t-hla", "options": [], "fastq1_in_name": "SAMPLE-A1.fastq.gz", "fastq1_in_path": "SAMPLE-A1.fastq.gz", "fastq1_out_name": "SAMPLE-A1.clean.fastq.gz", "fastq1_out_path": "SAMPLE-A1.clean.fastq.gz", "reads_in": 241805, "reads_out": 234757, "reads_removed": 70484, "reads_removed_proportion": 0.2915 } ]

Bar plots assume categories do not overlapp

Yes, it should. I will update the count in the reports.

Pull request raised to change the counts.

322

321

We should use real outputs from the tool in the test data. Can you run Hostile to generate some?

multiqc/modules/hostile/hostile.py

vladsavelyev

I pushed some updates, and with the test example MultiQC/test-data@2eee448, we are good to go - merging this PR now.

SumeetTiwari07 · 2024-05-02T10:47:09Z

I pushed some updates, and with the test example MultiQC/test-data@2eee448, we are good to go - merging this PR now.

Thank you so much. I too added some more logs from a metagenomic sampling.

SumeetTiwari07 added 6 commits April 23, 2024 17:32

New Module: Hostile

fd7c948

New Module: Hostile

5e06750

updates v1

b3bb9e7

Merge branch 'main' into main

b0edabf

update v2

8fc09cf

Merge branch 'main' of github.com:SumeetTiwari07/MultiQC

4def979

SumeetTiwari07 commented Apr 25, 2024

View reviewed changes

vladsavelyev added the module: new label Apr 26, 2024

Merge branch 'MultiQC:main' into main

d9c170d

vladsavelyev reviewed Apr 26, 2024

View reviewed changes

multiqc/utils/search_patterns.yaml Outdated Show resolved Hide resolved

vladsavelyev reviewed Apr 26, 2024

View reviewed changes

multiqc/modules/hostile/hostile.py Outdated Show resolved Hide resolved

vladsavelyev reviewed Apr 26, 2024

View reviewed changes

multiqc/modules/hostile/hostile.py Outdated Show resolved Hide resolved

SumeetTiwari07 added 4 commits April 26, 2024 22:45

review 1

c73d63e

Merge branch 'main' of github.com:SumeetTiwari07/MultiQC

09f6749

review 1

069e1c6

review 1: minor update

51f61da

SumeetTiwari07 requested a review from vladsavelyev April 26, 2024 22:36

SumeetTiwari07 and others added 2 commits April 29, 2024 09:55

Merge branch 'main' into main

a27f8a2

Merge branch 'main' into main

63a7ffe

vladsavelyev reviewed Apr 29, 2024

View reviewed changes

multiqc/modules/hostile/hostile.py Outdated Show resolved Hide resolved

vladsavelyev reviewed Apr 29, 2024

View reviewed changes

multiqc/modules/hostile/hostile.py Outdated Show resolved Hide resolved

vladsavelyev reviewed Apr 29, 2024

View reviewed changes

multiqc/utils/search_patterns.yaml Outdated Show resolved Hide resolved

Merge branch 'main' into SumeetTiwari07/main

d15e234

vladsavelyev reviewed Apr 29, 2024

View reviewed changes

Refactoring

d7d0b4e

vladsavelyev reviewed Apr 29, 2024

View reviewed changes

multiqc/modules/hostile/hostile.py Outdated Show resolved Hide resolved

vladsavelyev added the waiting: response Waiting for more information from user label May 1, 2024

vladsavelyev force-pushed the main branch from 5ca273f to d7d0b4e Compare May 2, 2024 10:37

Merge branch 'main' into main

520576f

vladsavelyev self-requested a review May 2, 2024 10:42

vladsavelyev approved these changes May 2, 2024

View reviewed changes

vladsavelyev added this to the MultiQC v1.22: Pydantic milestone May 2, 2024

vladsavelyev changed the title ~~New Module: Hostile~~ New module: Hostile May 2, 2024

vladsavelyev mentioned this pull request May 2, 2024

Hostile: New logs MultiQC/test-data#323

Merged

Warning about diff databases. Check that bar sums up

53b2dc2

vladsavelyev merged commit ba3522b into MultiQC:main May 2, 2024
4 checks passed

vladsavelyev mentioned this pull request May 6, 2024

hostile not actually supported in v1.21 ("Invalid value for '-m' / '--module': 'hostile' is not one of . . .") #2539

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New module: Hostile #2501

New module: Hostile #2501

SumeetTiwari07 commented Apr 23, 2024 •

edited

SumeetTiwari07 left a comment •

edited

vladsavelyev commented Apr 26, 2024

vladsavelyev commented Apr 26, 2024

SumeetTiwari07 commented Apr 26, 2024 •

edited

vladsavelyev Apr 29, 2024 •

edited

SumeetTiwari07 Apr 30, 2024

vladsavelyev Apr 29, 2024

SumeetTiwari07 Apr 30, 2024 •

edited

SumeetTiwari07 Apr 30, 2024

SumeetTiwari07 Apr 30, 2024

vladsavelyev Apr 29, 2024

SumeetTiwari07 Apr 30, 2024

vladsavelyev Apr 29, 2024

SumeetTiwari07 Apr 30, 2024

SumeetTiwari07 Apr 30, 2024

vladsavelyev Apr 30, 2024

vladsavelyev left a comment

SumeetTiwari07 commented May 2, 2024

	description=f"This plot shows the number of cleaned reads vs host-reads per sample (database index: {database}).",
	description=f"This plot shows the number of cleaned reads vs host-reads per sample (DB: Database).",

New module: Hostile #2501

New module: Hostile #2501

Conversation

SumeetTiwari07 commented Apr 23, 2024 • edited

SumeetTiwari07 left a comment • edited

Choose a reason for hiding this comment

vladsavelyev commented Apr 26, 2024

vladsavelyev commented Apr 26, 2024

SumeetTiwari07 commented Apr 26, 2024 • edited

vladsavelyev Apr 29, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SumeetTiwari07 Apr 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vladsavelyev left a comment

Choose a reason for hiding this comment

SumeetTiwari07 commented May 2, 2024

SumeetTiwari07 commented Apr 23, 2024 •

edited

SumeetTiwari07 left a comment •

edited

SumeetTiwari07 commented Apr 26, 2024 •

edited

vladsavelyev Apr 29, 2024 •

edited

SumeetTiwari07 Apr 30, 2024 •

edited