Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing reads region has valid insulation score #153

Open
caragraduate opened this issue Apr 9, 2023 · 4 comments
Open

missing reads region has valid insulation score #153

caragraduate opened this issue Apr 9, 2023 · 4 comments

Comments

@caragraduate
Copy link

Hi there,

Thank you for maintaining the tool! I have a question regarding two of my samples. Looks like they both have the missing read regions from the Hi-C contact map (those white stripes between 72.2Mb to 72.4Mb).
GM18534-chr1-72300641-DEL-45516
GM18939-chr1-72300641-DEL-45516

I expected to see over those regions, there are no insulations scores since there are no valid aligned reads, like the first figure. However, you can see from the second figure even though it has missing reads, there are still valid insulation scores from the second panel. I wonder if this indicates there is sth wrong with the analysis or if it has some other interpretation.

Another question is, in truth, both of the samples contain large deletions around 45 kb, which actually locates the same position as the white stripes, so I wonder when a white/gray (sometimes it shows as gray color, I am not sure the difference between the two colors) regions show up in the heatmap, how to know if they are missing reads in the raw sequence or they are deletions for the sample?

Thank you for any of your incoming comments on this question!

@kaukrise
Copy link
Collaborator

Hi!

Grey regions denote masked regions. This is a feature of FAN-C, where regions can be declared "invalid", typically through filtering. If a bin has no reads, it is automatically declared invalid (=masked) when you are using FAN-C format. The low coverage filter (fanc hic --filter-low-coverage-relative or --filter-low-coverage is one example of a filter that removes contacts for specific bins. In most FAN-C commands, masked bins are ignored, including the insulation command.

When using a different format, there is no direct equivalent for masking. When using Juicer files, FAN-C relies on NaN values in the bias vector to mask regions. When there is no NaN in the vector, bins will not be masked, and signal is interpreted as 0. There is currently no FAN-C support for masking bins in Cooler files.

White regions are simply regions with 0 contacts.

What file formats and normalisation method were you using for the different plots? What preprocessing pipeline did you use?

If you use the same pipeline for both plots, my guess would be that the second sample still has signal somewhere in those bins (outside of the matrix section you are showing). If you are using FAN-C, stricter filtering as outlined above could fix the issue.

@kaukrise
Copy link
Collaborator

On closer inspection of the image, you can even see some pixels with signal in those regions that prevent masking of the bin:
230791943-9dbe29ca-7915-4734-ac64-d3c5f41f48a0

@caragraduate
Copy link
Author

Hi there,

Thank you for the detailed explanation! That made a lot of sense to me. I am using Juicer with SCALE normalization to preprocess both samples and load the .hic file in the fan-c to call insulation and plot. To follow up, is there any difference between the two different colors, like light gray and pure white?

Many thanks then~

@kaukrise
Copy link
Collaborator

White regions are counted as 0. For the insulation score, that means these 0s are taken into account in the sliding window, which is shown here:

https://fan-c.readthedocs.io/en/latest/fanc-executable/fanc-analyse-hic/domains.html#insulation-score

Insulation score example

Therefore you will observe a drop in insulation score at the predominantly white locations in your plots.

Gray regions are masked, and therefore considered NAs. If an insulation window has more than 50% missing values, it is also marked as invalid. For regions with less than 50% masked pixels, the normalisation of insulation scores also takes into account masked values, so it does not penalise partially masked windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants