Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new module: sincei #1946

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

new module: sincei #1946

wants to merge 16 commits into from

Conversation

vivekbhr
Copy link

@vivekbhr vivekbhr commented Jul 4, 2023

Added sincei module to summarise read and count-level single-cell QC metrics from sincei

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md has been updated
  • There is example tool output for tools in the https://github.com/ewels/MultiQC_TestData repository or attached to this PR
  • Code is tested and works locally (including with --lint flag)
  • docs/README.md is updated with link to below
  • docs/modulename.md is created
  • Everything that can be represented with a plot instead of a table is a plot
  • Report sections have a description and help text (with self.add_section)
  • There aren't any huge tables with > 6 columns (explain reasoning if so)
  • Each table column has a different colour scale to its neighbour, which relates to the data (eg. if high numbers are bad, they're red)
  • Module does not do any significant computational work

@vladsavelyev vladsavelyev self-requested a review August 25, 2023 16:32
Copy link
Member

@vladsavelyev vladsavelyev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution!

First, a minor comment, I'd try to refactor to not to assume the order for columns (i.e. indexing them with numbers like cols[0], cols[1]). Since the header is available, I'd rather use csv to parse the file reader = csv.DictReader(open(filename), delimiter=",") and index with the column name (e.g. cols['Cell_ID'], row["# Reads"], etc), to make it more robust to future changes of columns and their order. And probably use same column name in MultiQC table whenever reasonable.

A bigger concern is that I feel like using cells as samples is not what MultiQC was designed for. The generated beeswarm plot already hangs my browser even on the test data (I know the beeswarm is really inefficient at the first place and the problem will probably go away with a future new plotting library; but not until we add more data). MultiQC is designed to work with summarised per-sample metrics, and I don't think cells can be counted as samples, as there are way too many of them, and feels too raw to be fed into MultiQC, I think.

It would be good if more summarised per-samples stats can be produced here, e.g. with some different tool downstream.

@vladsavelyev vladsavelyev added the waiting: response Waiting for more information from user label Aug 25, 2023
Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @vivekbhr! As @vladsavelyev, I think that we need to rethink what data goes into the report here. The docs have a section called Don’t add everything which I think applies. MultiQC should only summarise, but here we're really just reporting on raw data.

The TSV files have a column called sample (eg. sortChIC-BM-SL1-k4me1-1) so maybe we could summarise stats for all cells under a given sample identifier?

@vivekbhr
Copy link
Author

Thanks a lot for the review and comments @vladsavelyev and @ewels . I understand what you say, I experienced the slow response on the output report myself when using bigger data, so summarizing per-sample stats, such as a sample-wise range and median value for each field could be a better way.

Do you think Its OK to do these calculations from the output .tsv file in the multiqc module, or are you saying that this output should already be reported by sincei? I think calculating sample-wise range and median should not be a big overhead for the multiqc module.

@vladsavelyev
Copy link
Member

Do you think Its OK to do these calculations from the output .tsv file in the multiqc module, or are you saying that this output should already be reported by sincei? I think calculating sample-wise range and median should not be a big overhead for the multiqc module.

I think it should be totally alright to summarise it in MultiQC!

@vladsavelyev vladsavelyev added waiting: changes Issue / PR is on hold, waiting for requested changes and removed waiting: response Waiting for more information from user labels Sep 16, 2023
@vivekbhr
Copy link
Author

vivekbhr commented Dec 10, 2023

@vladsavelyev @ewels I've finally found time to complete the changes as requested. The multiqc module now reports sample summaries (median values), instead of per-cell summary. Also, the column headers are used instead of integer positions. Other minor comments are also resolved. I hope the changes are satisfactory now.

Best,
Vivek

@ewels ewels removed the waiting: changes Issue / PR is on hold, waiting for requested changes label Dec 10, 2023
@vladsavelyev vladsavelyev added this to the MultiQC v1.19 milestone Dec 11, 2023
@vladsavelyev vladsavelyev removed this from the MultiQC v1.19 milestone Dec 13, 2023
@vivekbhr
Copy link
Author

@vladsavelyev @ewels I have made all the requested changes, but the merging is still blocked with "changes requested". Please let me know if there are still unresolved issues. Thanks!

@ewels
Copy link
Member

ewels commented Dec 19, 2023

Thanks @vivekbhr! We'll get back to this to review the updated code as soon as we can.

@vivekbhr
Copy link
Author

vivekbhr commented May 6, 2024

@ewels @vladsavelyev Did you find some time at the end to review the updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants