Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insertions.fa contains no insertion sequences #205

Open
ethering opened this issue Jan 19, 2024 · 3 comments
Open

insertions.fa contains no insertion sequences #205

ethering opened this issue Jan 19, 2024 · 3 comments

Comments

@ethering
Copy link

Hi,
I'm running SURVIVOR v1.0.7 and I'm generating a simulated genome sequence with SVs in order to map my own reads to it and call SVs.
First I'm generating a parameters file:

$ SURVIVOR simSV test_params.param

Output (I've increased the INDEL_value to ensure insertions):

PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
DUPLICATION_minimum_length: 100
DUPLICATION_maximum_length: 10000
DUPLICATION_number: 3
INDEL_minimum_length: 20
INDEL_maximum_length: 500
INDEL_number: 10
TRANSLOCATION_minimum_length: 1000
TRANSLOCATION_maximum_length: 3000
TRANSLOCATION_number: 2
INVERSION_minimum_length: 600
INVERSION_maximum_length: 800
INVERSION_number: 4
INV_del_minimum_length: 600
INV_del_maximum_length: 800
INV_del_number: 2
INV_dup_minimum_length: 600
INV_dup_maximum_length: 800
INV_dup_number: 2

Then I generated a simulated reference sequence (option 3=1) to generate the SVs:

$ SURVIVOR simSV reference.fasta test_params.param 0.1 1 simulated
# Chrs passed size threshold:4
generate SV
apply mut ref!
apply: Mt 21146 4
apply: Mt 42091 4
apply: Mt 43332 4
apply: Chr3 45180 1
apply: Chr3 344508 2
apply: Chr2 809100 2
apply: Chr3 869844 1
apply: Chr1 1336924 4
apply: Chr2 1360145 2
apply: Chr3 2220985 2
apply: Chr1 1233970 3
apply: Chr2 3842835 4
apply: Chr2 4354418 1
apply: Chr1 4596703 4
apply: Chr2 860780 3
apply: Chr1 4982876 1
Post SV simulation Genome checking:
generate SNP
write genome
write SV
Done: SV+SNP simulated

So.....
Sometimes when I run SURVIVOR simSV to generate the SVs simulated.insertions.fa is totally empty, and sometimes it's not empty, but contains only the fasta header line of the insertions:

$ cat simulated.insertions.fa 
>Chr3_45180

>Chr3_869844

>Chr2_4354418

>Chr1_4982876

I've run SURVIVOR simSV a number of times, using around 5 different param files (using different SV min/max sizes) and this behaviour is constant. However, when I run simSV with option 3=0, my insertions.fa file contains the insertions.

Perhaps I've misunderstood something here, but intuitively I would presume that using option 3=1 (simulate genome), the insertions.fa would be the actual insertions in the simulated genome as using option3=0 (simulate reads), insertions.fa would be empty as the insertions are generated by SURVIVOR simreads which doesn't require the insertions.fa file.

@fritzsedlazeck
Copy link
Owner

Hi Graham,
sorry for this. Do you see the ins in the VCF file ?
Thanks
Fritz

@ethering
Copy link
Author

ethering commented Jan 19, 2024

Hi Friz,
Yes, they're at the end of the VCF file. Here are the VCF entries

Chr3	45180	INS1487952SURVIVOR	N	<INS>	.	LowQual	PRECISE;SVTYPE=INS;SVMETHOD=SURVIVOR_sim;CHR2=Chr3;END=45341;SVLEN=161	GT:GL:GQ:FT:RC:DR:DV:RR:RV	1/1
Chr3	869844	INS1487955SURVIVOR	N	<INS>	.	LowQual	PRECISE;SVTYPE=INS;SVMETHOD=SURVIVOR_sim;CHR2=Chr3;END=870223;SVLEN=379	GT:GL:GQ:FT:RC:DR:DV:RR:RV	1/1
Chr2	4354418	INS1487961SURVIVOR	N	<INS>	.	LowQual	PRECISE;SVTYPE=INS;SVMETHOD=SURVIVOR_sim;CHR2=Chr2;END=4354910;SVLEN=492	GT:GL:GQ:FT:RC:DR:DV:RR:RV	1/1
Chr1	4982876	INS1487964SURVIVOR	N	<INS>	.	LowQual	PRECISE;SVTYPE=INS;SVMETHOD=SURVIVOR_sim;CHR2=Chr1;END=4983124;SVLEN=248	GT:GL:GQ:FT:RC:DR:DV:RR:RV	1/1

Also, when I map real reads to the simulated reference with Minimap2, and use Sniffles to call SVs, I also get them reported in eval_simulated_right.vcf

@fritzsedlazeck
Copy link
Owner

ok that might be the best workaround for now. Sorry about this . Lately I was more focused on the VCF file than the fasta file..
Cheers
Fritz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants