Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All pairs are corrupt ("XX") #221

Open
lu-r-lu opened this issue Mar 23, 2024 · 6 comments
Open

All pairs are corrupt ("XX") #221

lu-r-lu opened this issue Mar 23, 2024 · 6 comments

Comments

@lu-r-lu
Copy link

lu-r-lu commented Mar 23, 2024

Hello,

A colleague and myself are stuck trying to figure out why are we getting the dreaded XX quality indicator for all our pairs when we try to run:
samtools view -h file1.hicup.bam | pairtools parse -c hg38.simple-chrom.sizes -o parsed_file1.pairsam.gz

I have ran this line exactly like that on another bam file (from another experiment though) and it worked fine, so we were wondering if something in the bam file might be corrupt/wrong/etc? It is not sorted (we checked) and the format of the chromosome names is consistent. Quality wise, we would say the fastq file is okay quality, not fantastic, but fine, aligned okay. What else can it be?

P.S. The pairs all look like the following few lines:
image

All suggestions are sincerely appreciated!

@golobor
Copy link
Member

golobor commented Mar 23, 2024

it seems that you're missing either sam1 or sam2. This means that your .sam entries for R1 and R2 got unpaired from one another for whatever reason. Could you check the content of file1.hicup.bam - do you see pairs of alignments for each read there?

@lu-r-lu
Copy link
Author

lu-r-lu commented Mar 25, 2024

@golobor Thank you so much for the reply. I think I have both SAM1 and SAM2. We have tested it this way (hopefully the right way):

$ samtools view -F 0x4 file1.hicup.bam | awk '{ if(and($2, 64)) count1++; else count2++ } END { print "SAM1 count:", count1; print "SAM2 count:", count2 }'
SAM1 count: 54353243
SAM2 count: 54353243

Any thoughts?

@golobor
Copy link
Member

golobor commented Mar 25, 2024

Could you show the first few alignments?

@lu-r-lu
Copy link
Author

lu-r-lu commented Mar 25, 2024

Let me know if this is helpful and if I am missing something, of course! TY

image

@golobor
Copy link
Member

golobor commented Mar 25, 2024

for some reason, you seem to have one alignment per readID, whereas pairtools parse expects at least two alignments per readID to identify contacts.
I do not have enough info to tell you why this happened. One simple option to try is to sort the .bam file by readID - maybe the alignments are there but got de-syncronized for whatever reason?

@lu-r-lu
Copy link
Author

lu-r-lu commented Mar 26, 2024

You were right about the odd sorting, it seems that this is how the files came out of hicup, as far as I was told.

I've done the following and then it worked fine:
samtools sort -n file1.hicup.bam -o sortedID_file1.hicup.bam

Thank you, really appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants