Fragment-level analysis #68

Phlya · 2018-05-18T17:37:37Z

Discussion to address open2c/distiller-nf#96.
Since there is pairsamtools restrict, we can use that as a starting point for adding fragment-level stats. But after some thinking, it seems more complicated to implement properly than I thought.

So, how about we add another column (pair_frags_type?) to the pairs file with the classification of the pairs according to this. And then maybe the actual counting can be a part of the regular stats? Then in distiller restrict should be called before stats, if we include it there (and I think it should be there since people expect this kind of stats)?

Here is a list of the stats that need to be implemented, basing off of hiclib. What did I forget?

Same-fragment pairs
- Dangling ends: pairs oriented towards each other that map to the same restriction fragment.
- Self-circles: the same, except they face in opposite directions.
- Self-ligations: the same, but oriented in the same direction (afaiu was not implemented in hiclib, but seems logical to include it)
- hiclib also has a category "error" here, what does that mean? @mimakaev
Undigested pairs - pairs that probably came from undigested or simply re-ligated fragments, so they face towards each other, map to different fragments, but the whole molecule is shorter than a certain threshold, which is the upper size of fragment sizes in the library. I'm presuming this is the "extraDandlingEnds" in hiclib?
- Can we estimate the fragment size from the pairs, by the way? E.g. as the longest dangling end? Something to think about.
Nothing from the above - "Distal"?
Did I forget something?

The text was updated successfully, but these errors were encountered:

Phlya · 2018-05-18T17:46:38Z

As an alternative to all this, provide a way to easily plot pair frequency by distance by read direction?

nvictus · 2018-05-18T18:00:46Z

Here's an example using the dask-adapter for pairix in bioframe.

It certainly doesn't require either dask or pairix, but would need to be accumulated chunkwise. The functions to compute "contact areas" and pairs scalings were recently moved to cooltools, but they are quite short and reusable: https://github.com/mirnylab/cooltools/blob/master/cooltools/expected.py#L53

Phlya · 2018-05-18T18:34:35Z

I think we just need to tweak the output of stats a bit so that we don't don't need to do anything on the raw pairs - it already saves number of contacts at different distance with different orientations. I can make a plot like this from stats:

I suppose, it lacks normalisation over the areas for each bin...
(This is all for mouse - with no centromeres, whether not counting over-centromeres pairs in stats directly is feasible I am not sure)

Also, if there are any ideas what went wrong with the "Bad" sample, I would be happy to hear your thoughts...

Phlya · 2018-05-19T11:29:44Z

OK, using @nvictus 's suggestion I can quite easily make this using the areas functions from cooltools and counts from stats.

Still not sure how to interpret the difference between samples, though (the bad one was not sequenced deeply, so the curves are not smooth).

Code, just in case it's useful.

def get_areas(d, chroms=pd.read_table('/home/s1529682/eddie/ilia/genomes/mm9/chrfile.txt', names=['chrom', 'length'])):
    bins = np.append(np.unique(d['Start'].values), [np.inf])
    areas = np.zeros(len(bins)-1)
    for i in range(len(chroms)):
        areas = areas + expected.contact_areas(bins, (0, chroms['length'][i]), (0, chroms['length'][i]))
    return areas

def get_scaling_by_orienation(stats):
    d = stats.loc[stats.index.str.startswith('dist_freq')]
    d = d.reset_index()
    d[['Distance', 'Orientation']] = d['index'].str.split('/', expand=True).iloc[:, 1:]
    d = d.drop('index', axis=1)
    d[['Start', 'End']] = d['Distance'].str.split('-', expand=True).apply(lambda x: [int(i.replace('+', '')) if i is not None else np.inf for i in x])
    d['Distance'] = np.exp((np.log(d['Start'])+np.log(d['End']))/2)
    d = d[['Orientation', 'Distance', 'Start', 'End', 'Value']]
    return d
def plot_scaling_by_orienation(stats, ax):
    d = get_scaling_by_orienation(stats)
    areas = get_areas(d)
    for orientation in '+-', '-+', '++', '--':
        sd = d[d['Orientation']==orientation]
        sd['Value']/= areas
        ax.plot(sd['Distance'], sd['Value']/d['Value'].sum(), label=orientation)

gfudenberg · 2018-05-20T16:44:08Z

Are they the same cell type etc? If so, there may be many ligation in trans in the "bad" dataset, by looking at how the P(s) bends more flat around 10^6. If this is the case & you plot the average trans level normalized appropriately, you can see the cis reaching this "noise floor", as we show in MicroC-XL supFig3 ( https://media.nature.com/original/nature-assets/nmeth/journal/v13/n12/extref/nmeth.4025-S1.pdf )

…

On Sat, May 19, 2018, 4:29 AM Ilya Flyamer ***@***.***> wrote: OK, using @nvictus <https://github.com/nvictus> 's suggestion I can quite easily make this using the areas functions from cooltools and counts from stats. [image: image] <https://user-images.githubusercontent.com/2895034/40268105-8082e3d8-5b5f-11e8-97ba-84f90190d944.png> Still not sure how to interpret the difference between samples, though (the bad one was not sequenced deeply, so I suppose the lack of smoothness is expected). Code, just in case it's useful. def get_areas(d, chroms=pd.read_table('/home/s1529682/eddie/ilia/genomes/mm9/chrfile.txt', names=['chrom', 'length'])): bins = np.append(np.unique(d['Start'].values), [np.inf]) areas = np.zeros(len(bins)-1) for i in range(len(chroms)): areas = areas + expected.contact_areas(bins, (0, chroms['length'][i]), (0, chroms['length'][i])) return areas def get_scaling_by_orienation(stats): d = stats.loc[stats.index.str.startswith('dist_freq')] d = d.reset_index() d[['Distance', 'Orientation']] = d['index'].str.split('/', expand=True).iloc[:, 1:] d = d.drop('index', axis=1) d[['Start', 'End']] = d['Distance'].str.split('-', expand=True).apply(lambda x: [int(i.replace('+', '')) if i is not None else np.inf for i in x]) d['Distance'] = np.exp((np.log(d['Start'])+np.log(d['End']))/2) d = d[['Orientation', 'Distance', 'Start', 'End', 'Value']] return d def plot_scaling_by_orienation(stats, ax): d = get_scaling_by_orienation(stats) areas = get_areas(d) for orientation in '+-', '-+', '++', '--': sd = d[d['Orientation']==orientation] sd['Value']/= areas ax.plot(sd['Distance'], sd['Value'], label=orientation) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390398720>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASvzZbzd68_JwlOj82Ail5TqhMb4ICxRks5t0AIpgaJpZM4UFGjp> .

Phlya · 2018-05-20T17:17:52Z

Thank @gfudenberg !
Yes, same cell type. This is just an example, I recently have problems like that with all libraries...
Yeah, there is a lot of trans (60-70%), and yeah, scaling is flat on the right, clearly a lot of random ligations or something like that - what I mean is there are also differences on the left side, and I thought maybe they can hint at what exactly is going wrong? Because I just have no clue what is going wrong now...

gfudenberg · 2018-05-20T17:39:51Z

On Sun, May 20, 2018 at 10:17 AM, Ilya Flyamer ***@***.***> wrote: Thank @gfudenberg <https://github.com/gfudenberg> ! Yes, same cell type. Yeah, there is a lot of trans (60-70%), and yeah, scaling is flat on the right, clearly a lot of random ligations or something like that - what I mean is there are also differences on the left side, and I thought maybe they can hint at what exactly is going wrong? Because I just have no clue what is going wrong now...

Hmm, I don't know if anyone has systematically looked into different failure modes & how they relate to short range orientation P(s). In part because I'd guess that a bunch of experiments that didn't work well never make it to GEO... I've never seen a bump at 30kb before, so if that's real (and not just e.g. noise from not sequencing deeply enough) my first guess would be something went wrong with the restriction step, leaving a bunch of large fragments which then have trouble ligating. Anything funky with the size of 3C products on the gel?

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390497243>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ASvzZZaSH5A3s83S1bbsZnAvZ8F4nlNeks5t0aVBgaJpZM4UFGjp> .

Phlya · 2018-05-20T17:53:35Z

I realize that, I just thought since you guys have been analyzing other people's data maybe you encountered something like that.

Do you mean 3 kb? Digestion works fine though... Size shift after Hi-C with blunt-end ligation is hard to see on the gel sometimes, so last week I just did 3C same way as I do Hi-C to check that the reagents are working fine, and the gel looks great.

(Perhaps not the best place for an extended discussion like this...)

gfudenberg · 2018-05-20T18:00:09Z

On Sun, May 20, 2018 at 10:53 AM, Ilya Flyamer ***@***.***> wrote: I realize that, I just thought since you guys have been analyzing other people's data maybe you encountered something like that. Do you mean 3 kb?

Ah, yes, meant 3kb

Digestion works fine though... Size shift after Hi-C with blunt-end ligation is hard to see on the gel sometimes, so last week I just did 3C same way as I do Hi-C to check that the reagents are working fine, and the gel looks great.

I see... yes, it would be interesting to systematically look at cis/total ratio vs. various short-distance orientation statistics (e.g. ratio of self-circles to dangling ends, etc.) to see if there's some useful troubleshooting signals.

nvictus · 2018-05-20T18:32:52Z

There's an enrichment in the supposed "self-circle" fraction < 10kb with a strange "double hump". I'm not sure how unusual it is, but I would have guessed some issue with fill-in or ligation. Your gel suggests that ligation is not the problem though. Maybe it's downstream stuff like library prep or PCR.

Have you looked at @gibcus's bootcamp slides? There might be some QC ideas to try there, like digesting PCR products. He might be able to chime in too.

gibcus · 2018-05-20T19:12:37Z

Hey Nezar et al., We found self-circles a lot when we were still doing dilute Hi-C. If if the ligation is less inefficient, or there are some cross-linking issues, you may find that T4 ligase can only really ligate fractions to themselves. Do you also have high %trans interactions? Johan From: Nezar Abdennur <notifications@github.com> Sent: Sunday, May 20, 2018 2:33 PM To: mirnylab/pairsamtools <pairsamtools@noreply.github.com> Cc: Gibcus, Johan <Johan.Gibcus@umassmed.edu>; Mention <mention@noreply.github.com> Subject: Re: [mirnylab/pairsamtools] Fragment-level analysis (#68) There's an enrichment in the supposed "self-circle" fraction < 10kb with a strange "double hump". I'm not sure how unusual it is, but I would have guessed some issue with fill-in or ligation. Your gel suggests that ligation is not the problem though. Maybe it's downstream stuff like library prep or PCR. Have you looked at @gibcus<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_gibcus&d=DwMFaQ&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=58olbnbITZR_6zBWImEHoDJH8XsNp5L7t6MHgnxC5TQ&m=yMgaf_dUKYwvYxxXtfFEE6qGLCyHO7ZpMHUWtW3OasQ&s=_Tuyre5TFOw_SJcU3xLeVYyYNnuNeN-SrrigTN9CCXw&e=>'s bootcamp slides<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hms-2Ddbmi_hic-2Ddata-2Danalysis-2Dbootcamp_blob_master_HiC-2DProtocol.pdf&d=DwMFaQ&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=58olbnbITZR_6zBWImEHoDJH8XsNp5L7t6MHgnxC5TQ&m=yMgaf_dUKYwvYxxXtfFEE6qGLCyHO7ZpMHUWtW3OasQ&s=YS-NJOr53PQLzx00HjMJaLC2CaFSy4rfxywjicrA2CE&e=>? There might be some QC ideas to try there, like digesting PCR products. He might be able to chime in too. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mirnylab_pairsamtools_issues_68-23issuecomment-2D390501699&d=DwMFaQ&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=58olbnbITZR_6zBWImEHoDJH8XsNp5L7t6MHgnxC5TQ&m=yMgaf_dUKYwvYxxXtfFEE6qGLCyHO7ZpMHUWtW3OasQ&s=dWYeFxrnViV32ivmhu_GTfXSQhLKA_FDHOZF7qmjFD0&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIsEqHzgimAf-5FSVu4ZlM4jl9mbOrPis0ks5t0bbVgaJpZM4UFGjp&d=DwMFaQ&c=WJBj9sUF1mbpVIAf3biu3CPHX4MeRjY_w4DerPlOmhQ&r=58olbnbITZR_6zBWImEHoDJH8XsNp5L7t6MHgnxC5TQ&m=yMgaf_dUKYwvYxxXtfFEE6qGLCyHO7ZpMHUWtW3OasQ&s=94X1wuvrLhJQCID-uYwqT3YarZ1lrXgWUNR5DnhBrWU&e=>.

Phlya · 2018-05-20T20:08:02Z

Hi @nvictus , @gibcus

Thanks for your ideas! Yeah, it's possible fill-in is inefficient, but I am not sure how to test this...

Trans-fraction is very high. Crosslinking is not the problem because this has happened with cells crosslinked by different people and even in different labs.

Phlya · 2018-05-21T13:53:12Z

@gfudenberg you have a good eye! The small peak at 3kb is real and present in a few other new bad data, but never in the old good ones. Here is an extremely bad dataset where it's really obvious:

And by looking at a few of them the height of the peak clearly correlates with how flat the scaling is on the right end! WTF is this?!

golobor · 2018-05-21T14:17:09Z

wow, this looks so weird!! Btw, what are the mapping stats, i.e. fraction of duplicates, unmapped and multimappers? Does it differ much from the "good" libraries?

…

On 21 May 2018 at 09:53, Ilya Flyamer ***@***.***> wrote: @gfudenberg <https://github.com/gfudenberg> you have a good eye! The small peak at 3kb is real and present in a few other new bad data, but never in the old good ones. Here is an extremely bad dataset where it's really obvious: [image: image] <https://user-images.githubusercontent.com/2895034/40310980-6c4f5c04-5d06-11e8-9724-8ef35e88ee4b.png> And by looking at a few of them the height of the peak clearly correlates with how flat the scaling is on the right end! WTF is this?! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390660044>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uCgeCgVltpuhg_l8iNHj43zaEdJD0ks5t0sbIgaJpZM4UFGjp> .

Phlya · 2018-05-21T14:32:47Z

Yeah, there are differences - values as a fraction of total reads for this extremely bad one.

golobor · 2018-05-21T14:37:40Z

huh! that's very concerning, actually! This suggests that the issue might be downstream of ligation, in sequencing. Did you check fastqc of the "bad" samples? Specifically, I'd check if you still have adapters

…

On 21 May 2018 at 10:32, Ilya Flyamer ***@***.***> wrote: Yeah, there are differences - values as a fraction of total reads. [image: image] <https://user-images.githubusercontent.com/2895034/40312765-b8848716-5d0b-11e8-84f3-a5c40b26709c.png> [image: image] <https://user-images.githubusercontent.com/2895034/40312930-2365c96e-5d0c-11e8-9aea-8d093d18b6ca.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390671911>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uChhip7a_N__ipzbNMVXvopqYsjPHks5t0tARgaJpZM4UFGjp> .

Phlya · 2018-05-21T14:40:28Z

No, I haven't checked, actually... Why do you think it might be adaptors? distiller can do that for me, right, if I specify do_fastqc: True?

golobor · 2018-05-21T14:43:00Z

it's just a guess, thinking of what could bump multimappers and nulls... yes, do_fastqc does fastqc :)

…

On 21 May 2018 at 10:40, Ilya Flyamer ***@***.***> wrote: No, I haven't checked, actually... Why do you think it might be adaptors? distiller can do that for me, right, if I specify do_fastqc: True? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390674234>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uCgKYM02GOkPNYb0lQeYkqjc-zKUqks5t0tHdgaJpZM4UFGjp> .

Phlya · 2018-05-21T14:52:55Z

No adaptors, but some weird stuff is going on!!! Unequal level of A and T across the whole read! And other things...
If you want to have a look, here is a zip with the output, serum-RI3t is good, RI4t is bad (the first plot here on top is from these).

fastqc.zip

golobor · 2018-05-21T14:57:21Z

also, an intense GC bias...

…

On 21 May 2018 at 10:52, Ilya Flyamer ***@***.***> wrote: No adaptors, but some weird stuff is going on!!! Unequal level of A and T across the whole read! And other things... If you want to have a look, here is a zip with the output, serum-RI3t is good, RI4t is bad (the first plot here on top is from these). fastqc.zip <https://github.com/mirnylab/pairsamtools/files/2023009/fastqc.zip> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390678179>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uCkSQ2XzZWjLaWC6ytKbzM129aMrtks5t0tTJgaJpZM4UFGjp> .

Phlya · 2018-05-21T14:57:43Z

(I have to say, they were sequenced on different machines)

Phlya · 2018-05-21T14:59:39Z

Also GATC signature in the beginning is way more pronounced.

Phlya · 2018-05-21T17:13:22Z

So no ideas from this?..

golobor · 2018-05-21T18:18:58Z

any chance you can bug the sequencing facility? This _strongly_ seems like an issue of sequencing, not HiC

…

On 21 May 2018 at 13:13, Ilya Flyamer ***@***.***> wrote: So no ideas from this?.. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390720024>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uCt4qPihzg6KAvd6uHoAk2CQQeHSoks5t0vWygaJpZM4UFGjp> .

Phlya · 2018-05-21T19:10:10Z

If you mean there is something wrong on the technical side with the sequencing, some bad samples were sequenced in one place, and some - in another...

But I might ask them for their opinion on what could cause such issues though.

golobor · 2018-05-21T19:16:46Z

hm, i'm confused now : (a) are there "bad" samples that don't have fastqc issues? (b) are there "good" samples that were sequenced in the same facility as the "bad" ones?

…

On 21 May 2018 at 15:10, Ilya Flyamer ***@***.***> wrote: If you mean there is something wrong on the technical side with the sequencing, some samples were sequenced in one place, and some - in another... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390752587>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uCgCnZAx4M1eCTLAuNPRQ1HtO1ef9ks5t0xESgaJpZM4UFGjp> .

Phlya · 2018-05-21T19:21:03Z

a) I haven't run fastqc on all of them
b) yes

I can try to compare them more comprehensively tomorrow

golobor · 2018-05-21T19:21:11Z

my point is that all observations you showed so far are consistent with a single issue behind "bad" samples - faulty sequencing; is there any piece of evidence that contradicts this theory? On 21 May 2018 at 15:14, Anton Goloborodko <goloborodko.anton@gmail.com> wrote:

…

hm, i'm confused now : (a) are there "bad" samples that don't have fastqc issues? (b) are there "good" samples that were sequenced in the same facility as the "bad" ones? On 21 May 2018 at 15:10, Ilya Flyamer ***@***.***> wrote: > If you mean there is something wrong on the technical side with the > sequencing, some samples were sequenced in one place, and some - in > another... > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390752587>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AA3uCgCnZAx4M1eCTLAuNPRQ1HtO1ef9ks5t0xESgaJpZM4UFGjp> > . >

Phlya · 2018-05-21T19:24:25Z

Both good and bad data have been produced in two different facilities, and at one point a few months ago all libraries, wherever we sequence them, started having similar issues.

Phlya · 2018-05-22T12:35:00Z

So, I ran fastqc for many of the good and bad libraries, and the only mostly consistent problem with the bad ones is very high over-representation of a particular group of K-mers in a particular place in the forward reads. This sequence looks like a part of the adaptor sequence, why it is so often found in a particular place, and only in the forward read is completely enigmatic to me, but their actual counts are pretty low and overall adaptor sequences are not very frequent, so I don't think this can be the source of these problems.

golobor · 2018-05-22T18:13:01Z

and what about the A-vs-T bias, is it common for all "bad" datasets?

…

On 22 May 2018 at 08:35, Ilya Flyamer ***@***.***> wrote: So, I ran fastqs for many of the good and bad libraries, and the only mostly consistent problem with the bad ones is very high over-representation of a particular group of K-mers in a particular place in the forward reads. This sequence looks like a part of the adaptor sequence, why it is so often found in a particular place, and only in the forward read is completely enigmatic to me, but their actual counts are pretty low and overall adaptor sequences are not very frequent, so I don't think this can be the source of these problems. [image: image] <https://user-images.githubusercontent.com/2895034/40361958-fe7962ca-5dc2-11e8-8e1c-2820c0213a36.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-390973589>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uCt1YuHI8rHQOD276hcPi-w7EUXfHks5t1AX1gaJpZM4UFGjp> .

Phlya · 2018-05-22T18:48:32Z

Well, it seems it's quite commonly present in reverse reads, but not in forward reads, whatever that means, and however that is possible... Here is an example.

Phlya · 2018-05-22T18:51:38Z

And yeah, similar effect in G vs C in reverse reads is also present.

golobor · 2018-05-22T18:54:13Z

I believe all of this is sufficiently concerning to contact the sequencing facility - you paid a lot of money for it, after all. Is the amount of the T->A and C->G bias correlated with how "bad" the library is? To conclude, do you think that the issues of the "bad" libraries can be entirely related to sequencing (and not Hi-C) or there are some observations against that theory?

…

On 22 May 2018 at 14:48, Ilya Flyamer ***@***.***> wrote: Well, it seems it's quite commonly present in reverse reads, but not in forward reads, whatever that means, and however that is possible... Here is an example. [image: image] <https://user-images.githubusercontent.com/2895034/40383537-094abc18-5df9-11e8-9e94-94eb53eade8b.png> [image: image] <https://user-images.githubusercontent.com/2895034/40383552-11b11e2e-5df9-11e8-9ce7-4c68943f927a.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-391100421>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uCiM-1aRjB7-hx_u4nqhgGZ_NPVEvks5t1F2BgaJpZM4UFGjp> .

Phlya · 2018-05-22T18:57:14Z

I will check that.

Well, as I said, I think it's unlikely that two facilities at the same time stop producing good data with similar symptoms...

Phlya · 2018-05-22T19:33:56Z

It seems like quality (scaling shallowness) is correlated to this bias, yes. But hard to say definitively.

golobor · 2018-05-22T19:39:42Z

Ah, great point. Btw, your multimappers went through the roof - more than a half of the sample. Maybe you can check them manually to see what's wrong with them?

…

On Tue, May 22, 2018, 15:33 Ilya Flyamer ***@***.***> wrote: It seems like quality (scaling shallowness) is correlated to this bias, yes. But hard to say definitively. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <https://github.com/mirnylab/pairsamtools/issues/68#issuecomment-391113862>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3uCuD5nLm5WV9JRqTs-6QC18S-4oyAks5t1GgkgaJpZM4UFGjp> .

Phlya · 2018-05-22T19:42:23Z

And actually I was wrong when I said that it's not present in forward reads - reverse reads are certainly more affected, but in at least one run most samples have it in forward reads too to a different extent - which seems to correlate with just how bad they are.

Phlya · 2018-05-22T19:46:01Z

Yeah, I noticed that too, and it's present (again, to a different extent) in a few samples I checked, but not sure how to look at them - like, what exactly should I look into?

Phlya · 2018-05-22T19:51:52Z

Also, how about we switch to e.g. G-chat @golobor ? I doubt everyone is interested in this!

wbszhu · 2021-06-23T08:49:04Z

Hi,
Is there any way that we can get another column (pair_frags_type?)? I really expect this kind of stats.
Thanks a lot.
Luzhang

Phlya mentioned this issue May 20, 2018

Example stats open2c/distiller-nf#96

Open

sergpolly mentioned this issue Jan 29, 2020

Make a module for multi-qc #78

Closed

agalitsyna mentioned this issue Apr 6, 2022

pairtools v1.0.0 roadmap #116

Closed

31 tasks

agalitsyna added the enhancement label Apr 20, 2022

agalitsyna mentioned this issue Aug 24, 2022

post-v1.0.0 proposals and issues (header post) #139

Open

13 tasks

Fragment-level analysis #68

Fragment-level analysis #68

Comments

Phlya commented May 18, 2018

Phlya commented May 18, 2018

nvictus commented May 18, 2018

Phlya commented May 18, 2018 • edited

Phlya commented May 19, 2018 • edited

gfudenberg commented May 20, 2018 via email

Phlya commented May 20, 2018 • edited

gfudenberg commented May 20, 2018 via email

Phlya commented May 20, 2018 • edited

gfudenberg commented May 20, 2018 via email

nvictus commented May 20, 2018

gibcus commented May 20, 2018 via email

Phlya commented May 20, 2018 • edited

Phlya commented May 21, 2018

golobor commented May 21, 2018 via email

Phlya commented May 21, 2018 • edited

golobor commented May 21, 2018 via email

Phlya commented May 21, 2018

golobor commented May 21, 2018 via email

Phlya commented May 21, 2018

golobor commented May 21, 2018 via email

Phlya commented May 21, 2018

Phlya commented May 21, 2018

Phlya commented May 21, 2018

golobor commented May 21, 2018 via email

Phlya commented May 21, 2018 • edited

golobor commented May 21, 2018 via email

Phlya commented May 21, 2018

golobor commented May 21, 2018 via email

Phlya commented May 21, 2018 • edited

Phlya commented May 22, 2018 • edited

golobor commented May 22, 2018 via email

Phlya commented May 22, 2018

Phlya commented May 22, 2018

golobor commented May 22, 2018 via email

Phlya commented May 22, 2018

Phlya commented May 22, 2018

golobor commented May 22, 2018 via email

Phlya commented May 22, 2018

Phlya commented May 22, 2018

Phlya commented May 22, 2018

wbszhu commented Jun 23, 2021

Phlya commented May 18, 2018 •

edited

Phlya commented May 19, 2018 •

edited

Phlya commented May 20, 2018 •

edited

Phlya commented May 20, 2018 •

edited

Phlya commented May 20, 2018 •

edited

Phlya commented May 21, 2018 •

edited

Phlya commented May 21, 2018 •

edited

Phlya commented May 21, 2018 •

edited

Phlya commented May 22, 2018 •

edited