Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SURVIVOR simSV - number of SVs doesn't correspond to parmas file #206

Open
ethering opened this issue Jan 22, 2024 · 0 comments
Open

SURVIVOR simSV - number of SVs doesn't correspond to parmas file #206

ethering opened this issue Jan 22, 2024 · 0 comments

Comments

@ethering
Copy link

ethering commented Jan 22, 2024

Hi,
I'm running SURVIVOR v1.0.7 and. I've noticed that the number of SVs events generated by SURVIVOR simSV (in both the .bed and .vcf files) does not correspond to the .parmams file and differs depending on the value of option 3 (0 or 1).

Here's what I see:

$ SURVIVOR simSV test.param

Output:

PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
DUPLICATION_minimum_length: 100
DUPLICATION_maximum_length: 10000
DUPLICATION_number: 3
INDEL_minimum_length: 20
INDEL_maximum_length: 500
INDEL_number: 1
TRANSLOCATION_minimum_length: 1000
TRANSLOCATION_maximum_length: 3000
TRANSLOCATION_number: 2
INVERSION_minimum_length: 600
INVERSION_maximum_length: 800
INVERSION_number: 4
INV_del_minimum_length: 600
INV_del_maximum_length: 800
INV_del_number: 2
INV_dup_minimum_length: 600
INV_dup_maximum_length: 800
INV_dup_number: 2

Then using SURVIVOR simSV to generate SVs:

Using option 3 = 1, I see the correct number of everything, other than zero DUP (I presume for inversions, INV_del_number + INV_dup_number = INVERSION_number). Also, the DUP value is always zero in the true positives and false negatives section of SURVIVOR eval.

$ SURVIVOR simSV reference.fasta test.param 0.1 1 test1_sv
$ cat test1_sv.bed

Mt	1098	Mt	1819	INV
Mt	17423	Mt	18216	INV
Chr2	800538	Chr2	800924	INS
Mt	51828	Chr3	1034461	TRA
Mt	54161	Chr3	1036794	TRA
Chr1	1406312	Chr1	1407023	INV
Chr1	2541684	Chr1	2542421	INV
Chr1	1740043	Chr3	3282514	TRA
Chr1	1741044	Chr3	3283515	TRA

Using option 3 = 0, I see the following (ordered by SV-type for ease):
5 Duplication events, not 3
5 INDELS (1 INS and 4 DEL), not 1
8 INVERSIONS, not 4

$ SURVIVOR simSV reference.fasta test.param 0.1 0 test0_sv
$ cat test0_sv.bed
Chr3	1671702	Chr3	1679825	DUP
Chr3	3600129	Chr3	3604236	DUP
Chr3	725731	Chr3	727808	DUP
Chr2	281472	Chr2	282151	DUP
Mt	55970	Mt	56657	DUP
Mt	43737	Mt	43991	INS
Chr2	2719697	Chr2	2719765	DEL
Chr2	2720309	Chr2	2720377	DEL
Chr2	1496557	Chr2	1496622	DEL
Chr2	1497150	Chr2	1497215	DEL
Chr2	721379	Chr3	1055120	TRA
Chr2	722729	Chr3	1056470	TRA
Chr2	4982397	Mt	21418	TRA
Chr2	4985041	Mt	24062	TRA
Mt	36164	Mt	36770	INV
Chr1	3880402	Chr1	3881102	INV
Chr3	3485167	Chr3	3485931	INV
Chr2	353814	Chr2	354459	INV
Chr2	2719765	Chr2	2720309	INV
Chr2	1496622	Chr2	1497150	INV
Mt	55970	Mt	56657	INV
Chr2	281472	Chr2	282151	INV

Can you comment on this? I've never really understood why SURVIVOR generates different data depending on what the downstream use of it will be (SVs in reference, or SVs in reads). But what is obvious here is that it appears to be generating a different number of SVs than requested in the params file.

Cheers,
Graham

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant