Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incongruities at program results #17

Open
dukelheit opened this issue Sep 26, 2020 · 3 comments
Open

Incongruities at program results #17

dukelheit opened this issue Sep 26, 2020 · 3 comments
Labels

Comments

@dukelheit
Copy link

Hello, im using SNPmatch in a VCF file where i have 7 replicates of 3 different plants (21 samples in total), i made the database with them and then i splitted in 21 separated VCFs to run the program with each one, but when i ran the program 21 times and made a heatmap to see the "probability of match" results, i see incongruents. For example, see the subsets RGS vs REED, subset 1 is showing more simmilarities than subset 2, how must i interpret this?

SNPMATCHresults

@rbpisupati
Copy link
Member

Hi @dukelheit! can you please explain a bit more on what functions are you using? does you VCF file only contains biallelic sites? are you using any other filtering while generating VCF?

@dukelheit
Copy link
Author

Hi again.
Sure, here i'll let the code which i has used to process my vcf:

"FILTER THE VCF (BIALLELIC INCLUSIVE)
bcftools view -i 'INFO/MQ>40 && INFO/FS<60 && INFO/QD>2 && INFO/SOR<3.0 && INFO/MQRankSum>-12.5 && INFO/ReadPosRankSum>-8.0 && %QUAL>=30' --max-alleles 2 --exclude-types indels -Ov named.vcf.gz > named.vcf

"SPLIT THE VCF FILE"
for sample in bcftools query -l named.vcf; do java -jar /home/franco/BinsPortables/gatk-4.1.5.0/gatk-package-4.1.5.0-local.jar SelectVariants -V named.vcf -O ${sample}.vcf -sn $sample; done

"MAKE THE DATABASE"
snpmatch makedb -i named.vcf -o DBs/db

"RUN SNPmatch"
for sample in $(ls | grep .vcf$); do snpmatch inbred -d DBs/db.hdf5 -e DBs/db.acc.hdf5 -i $sample -o Op/output_snpmatch_${sample}; done

Finally i just plot the 21 x 21 "probability of match" results
Thanks in advance
Franco

@rbpisupati
Copy link
Member

Thank you! I still do not see why you couldn't see why you are seeing such results. We use the same functions to identify identical strains if any within our datasets. I am not sure if this is a bug in the program, or with data you are using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants