Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example stats #96

Open
Phlya opened this issue Apr 25, 2018 · 8 comments
Open

Example stats #96

Phlya opened this issue Apr 25, 2018 · 8 comments

Comments

@Phlya
Copy link
Member

Phlya commented Apr 25, 2018

Hi guys,

Would be great to add some examples of stats for good and bad experiments with explanations which step in the protocol might have failed, and how to understand this. I am currently troubleshooting why my recent Hi-C have not been working well, and with the coded pair type annotation this part of it is a little more complicated than I expected.

Also, more of a pairsamtools issue, but related: W pair type is still called C there in the docs (it's the same thing, right? I seem to remember it mentioned at some point).

@Phlya
Copy link
Member Author

Phlya commented May 13, 2018

Also... As we briefly discussed with Sergey and Johan, having fragment-level stats like in hiclib and other pipelines (e.g. dangling ends, self-circles etc) would be very helpful for troubleshooting failed experiments.

@sergpolly
Copy link
Member

Let's collect all of the stats update requests in one place.
So far we have this:
https://github.com/mirnylab/pairsamtools/issues/59
https://github.com/mirnylab/pairsamtools/issues/56
https://github.com/mirnylab/pairsamtools/issues/54
https://github.com/mirnylab/pairsamtools/issues/5
#94
#90

Please, @golobor , @Phlya , @nvictus review the list, prioritize and let's go from there

@gfudenberg
Copy link
Member

I couldn't find these in the referenced posts, but it would also be nice to have:
a) P(s) for different read orientations separately as well-- this is useful for for finding where they converge and reads can be interpreted as "just measuring contact frequency"
b) number of reads with mitochondria is a nice stat (mito_vs_anyReads, mito_vs_mito, etc.)
c) number of single-sided and double-sided read pairs

@Phlya
Copy link
Member Author

Phlya commented May 15, 2018

I personally think that adding new kinds of stats in principle is more important, and the different saving/printing options can be implemented later. Also, I don't think having optical dups is important (can we really do anything about them when preparing libraries? I doubt it...), but maybe I misunderstand something.

@Phlya
Copy link
Member Author

Phlya commented May 15, 2018

Also note, that the fragment-level stats would require matching pairs with fragments... But perhaps with both inputs sorted and indexed it won't be very expensive?

@Phlya
Copy link
Member Author

Phlya commented May 20, 2018

I guess it should be possible to address @gfudenberg 's point (a) quite easily, since these counts are all already present in the output of stats - https://github.com/mirnylab/pairsamtools/issues/68. Although, perhaps, the bins can be optimized a bit to make more smooth curves?

But is having plots in the output in the plans? As an html/pdf report with different things, or just a folder with individual pngs/pdfs? Should their generation be part of stats, or a separate job, which just takes the output of stats?

@golobor
Copy link
Member

golobor commented May 20, 2018 via email

@Phlya
Copy link
Member Author

Phlya commented May 20, 2018

Yeah, I've seen that and even tried to install once without success. But considering there is pairsamtools stats already which calculate so many things, I don't think there is any point in using another QC tool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants