Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Question regarding low peak numbers #76

Open
AntonioAhn opened this issue Oct 11, 2020 · 1 comment
Open

Q: Question regarding low peak numbers #76

AntonioAhn opened this issue Oct 11, 2020 · 1 comment

Comments

@AntonioAhn
Copy link

Thank you for HMMRATAC. I would love to use this tool but currently, I am having issues with low peak numbers (1262 peaks). This is low in comparison to peaks found with MACS2 (83610 peaks). I have paired-end ATACseq data derived from a cancer cell line.

I ran the tool with:
java -jar HMMRATAC.jar -b ATAC.sorted.bam -i ATAC.sorted.bam.bai -g genome.info -o out_HMM

The model.log shows that the open state (state 2) has the highest emission parameters for all 4 signal components, the nucleosome state (state 1) has the second-highest emission for all signal components and the background state (state 0) has the lowest. I was wondering if my model shows something else wrong that is giving me low peak numbers?

I have also run HMMRATAC with a more stringent setting for creating the model: by increasing -u and -l however this does not change the peak numbers. For example, running with -u 40 -l 20 gave me 1254 peaks.

Would you be able to give some advice on what the issue could be and also how to correct the problem?
I have also included a figure showing the fragment size distribution in case this may help.

The model log with default settings shows:

Fragment Expectation Maximum Done
Mean 50.0 StdDevs 20.0
Mean 177.86633552152415 StdDevs 51.64088623246107
Mean 385.458734684104 StdDevs 75.7992181115244
Mean 652.0928097547281 StdDevs 75.62727395159729
ScalingFactor 21.994759
Training Regions found and Zscore regions for exclusion found
Training Fragment Pileup completed
Kmeans Model:
HMM with 3 state(s)

State 0
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.021 0.03 0.006 0.011 ]

State 1
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.06 0.13 0.203 0.052 ]

State 2
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.206 0.381 0.424 0.169 ]

Model created and refined. See test.model
Model:
HMM with 3 state(s)

State 0
Pi: 0.3333333333333333
Aij: 0.968 0.015 0.017
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0 0 0 0 ]

State 1
Pi: 0.3333333333333333
Aij: 0.017 0.955 0.028
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.024 0.085 0.192 0.059 ]

State 2
Pi: 0.3333333333333333
Aij: 0.022 0.032 0.946
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.144 0.232 0.187 0.066 ]

picard_insert_size_HMMRATAC_issue_question

@AntonioAhn AntonioAhn changed the title Q: Q: Question regarding low peak numbers Oct 11, 2020
@EvanTarbell
Copy link
Collaborator

It seems like the issue is with the third signal track. In state 2, it is 0.187 while in state 1 it is 0.192.
I would try even more stringent -u and -l, for instance -u 50 and -l 25.
You can also lower the value for --threshold, from 30 to something smaller to increase the peak number - to say 20 or 10. But you will really want to find a set of -u and -l values that correct that issue with the third signal track. That is the likely culprit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants