Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The same input bam, fasta and models, but different output size of merge_output.vcf.gz #303

Open
Jerry-is-a-mouse opened this issue Apr 26, 2024 · 5 comments

Comments

@Jerry-is-a-mouse
Copy link

Hi, when I use clair3(v1.0.5) to call variants in HG002's PacBio HiFi 15-20kb chemistry2 reads, I typed the run_clair3.sh command twice in the command line, using the same inputs, but the results of merge_output.vcf.gz were of different sizes. Is it right or in other words, this result is caused by the principle and algorithm clair3 used?

@aquaskyline
Copy link
Member

Could you please look into the two VCFs and see what are the differences.

@Jerry-is-a-mouse
Copy link
Author

@aquaskyline I count how many variants were called using wc command as follows:
(1) The one vcf.gz I got yesterday:
less merge_output.vcf.gz | grep -v "^#" | wc -l
4443956
(2) The one vcf.gz I got about 2 months ago:
less HG002_Nanopore.vcf.gz | grep -v "^#" | wc -l
4527382
I am so sorry that the files are too big to upload.

@aquaskyline
Copy link
Member

One of your files named Nanopore, but you said you were using the same PacBio HiFi input for both runs?

@Jerry-is-a-mouse
Copy link
Author

Sorry,what I used is Nanopore sequencing. Because I had re-run the both type of data, so I found out that the pacbio hifi result is the same but nanopore are different.

@aquaskyline
Copy link
Member

Outputs of Clair3 are deterministic. You might want to try again using the same version, model, and parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants