Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biological signal missing from cool files #385

Open
NMaziak opened this issue Feb 2, 2024 · 2 comments
Open

Biological signal missing from cool files #385

NMaziak opened this issue Feb 2, 2024 · 2 comments

Comments

@NMaziak
Copy link

NMaziak commented Feb 2, 2024

Hi there,

First, thanks for such handy tools! We recently changed clusters and with this I upgraded cooler from version 0.8.6 to the most recent version. I'm working with high resolution Micro-C maps, most of which hit more than 10k reads per bin at less than 200 bps resolution, and when scrolling through Higlass I saw some differences between my new and old cool files.

I have checked that this is indeed the mcool file and not Higlass, but there are regions which should be mappable which get covered up (first attached screenshot shows region at 500 bp resolution, top is older version, bottom is newer version). When I zoom in to 125 bp, these regions hidden in the middle of the newer version mcool go away and you can see the signal (screenshot 2). I looked at my logs and saw that these coolers could not converge towards the end (both versions), so I remade cool files starting with 1kb. While these did converge, I had the same issue. I'm not really sure why that is, and was wondering what could have changed between version to cause this? Is it the new use of CLI?

Sometimes these show up where I know there is a region without any mapping, but sometimes these grey lines appear over nicer regions which shouldn't cause any issue, in the region above, to me it looks like it is removing biological signal based on its behavior.

I have also double checked that these don't coincide with blacklisted regions an un-mappable regions.
This is fly data by the way.

Thanks for the help,
Noura
Screenshot 2024-02-02 at 16 09 19
Screenshot 2024-02-02 at 16 21 00

@nvictus
Copy link
Member

nvictus commented Feb 16, 2024

There must have been some difference in the filtering parameters for balancing between the first time you ran it and now. I don't think any of the default thresholds changed on the command line but it's possible it was run with different parameters the first time:
image

grey lines appear over nicer regions which shouldn't cause any issue

I'm curious if there are issues much farther in trans, or if the raw counts just happen to be relatively low in those bins. The mad-max filter might be too aggressive for your dataset, so you can try turning it off entirely by setting it to 0, or making it less aggressive by setting the value higher (e.g. 9-12).

@NMaziak
Copy link
Author

NMaziak commented Apr 8, 2024

Hi there,

You were right, the mad-max was the culprit, setting it to zero solved the issue. While setting it to be less stringent helped in certain locations it still caused issues when zooming out.

I wanted to also point out that my data had a difficult time converging with 200 iterations when balancing, it worked fine with upping the max-iter, which is great, but the strange part was that this wasn't correlated to sequencing depth or resolution, but rather how "structured" a sample is.

This isn't specific to my micro-c but was also present in human hi-c samples we processed. Would you happen to know why that is?

All the best and thanks again!

Best,
Noura

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants