Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running vcf2xg.sh on c. elegans #20

Open
moldach opened this issue May 1, 2020 · 22 comments
Open

Error running vcf2xg.sh on c. elegans #20

moldach opened this issue May 1, 2020 · 22 comments

Comments

@moldach
Copy link

moldach commented May 1, 2020

First off, I have to say great job on the tool! I've played around with the practice data and now I'm very excited to try this tool out and show it to my laboratory. I've been evaluating ~10 tools for looking at SVs and yours by-far looks the most promising!

I'm getting an error trying to convert a .VCF to .xg using the vcf2xg.sh script on the reference genome (C. elegans) used for alignment.

(base):~/TEST-MOMIG$ sudo bash ~/MoMI-G/scripts/vcf2xg.sh \
    ~/projects/maddog/CNVnator/470.vcf  \
    momig-vcf-test-output ~/vg \
    c_elegans.PRJNA13758.WS265.genomic.fa.gz

Traceback (most recent call last):
	3: from /home/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	2: from /home/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	1: from /home/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:89:in `block in <main>'
/home/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:89:in `+': no implicit conversion of nil into String (TypeError)
+ input=.//momig-vcf-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-vcf-test-output.pcf
+ sort -t, -k 7n .//momig-vcf-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
+ uniq
++ sed 1d .//momig-vcf-test-output.pcf
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-vcf-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-vcf-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-vcf-test-output.pcf.bp.merged.tsv .//momig-vcf-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz
[W::fai_get_val] Reference :0-270000000 not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in :0-270000000
ERROR: Signal 11 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_YNr9xC/stacktrace.txt
Please include the stack trace file in your bug report!
error[VPKG::load_one]: Correct input type not found while loading handlegraph::MutablePathDeletableHandleGraph
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

I tried another call set (Delly) to see if it's something specific to the format of VCF but got a similar error:

(base) mtg@mtg-ThinkPad-P53:~/TEST-MOMIG$ sudo bash ~/MoMI-G/scripts/vcf2xg.sh ~/projects/maddog/delly/delly.vcf  momig-vcf-test-output ~/vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ input=.//momig-vcf-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-vcf-test-output.pcf
+ sort -t, -k 7n .//momig-vcf-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
+ uniq
++ sed 1d .//momig-vcf-test-output.pcf
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-vcf-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-vcf-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-vcf-test-output.pcf.bp.merged.tsv .//momig-vcf-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz
[faidx] Truncated sequence: I:15066834-270000000
[faidx] Truncated sequence: II:15124909-270000000
[faidx] Truncated sequence: III:13778981-270000000
[faidx] Truncated sequence: IV:17493322-270000000
[faidx] Truncated sequence: MtDNA:13794-270000000
[faidx] Truncated sequence: V:20916455-270000000
[faidx] Truncated sequence: X:17629941-270000000
ERROR: Signal 11 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_Vnjilw/stacktrace.txt
Please include the stack trace file in your bug report!
error[VPKG::load_one]: Correct input type not found while loading handlegraph::MutablePathDeletableHandleGraph
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

The stack-trace from Delly is:

Crash report for vg v1.23.0 "Lavello"
Stack trace (most recent call last):
#8    Object "/home/vg", at 0x4dd279, in _start
#7    Object "/home/vg", at 0x1bb8ec8, in __libc_start_main
#6    Object "/home/vg", at 0x40aed7, in main
#5    Object "/home/vg", at 0x9e77c7, in vg::subcommand::Subcommand::operator()(int, char**) con>
#4    Object "/home/vg", at 0x9f6a61, in main_view(int, char**)
#3    Object "/home/vg", at 0xa6c4ab, in vg::get_input_file(std::__cxx11::basic_string<char, std>
#2    Object "/home/vg", at 0x9f2ad5, in std::_Function_handler<void (std::istream&), main_view(>
#1    Object "/home/vg", at 0xbc103b, in vg::gfa_to_graph(std::istream&, vg::VG*, bool)
#0    Object "/home/vg", at 0x12e3f20, in stPinchThread_getLast

Any idea what could be happening here? One thought is that chromosome in H. sapiens are usuallly labeled chr1 but in C. elegans they are labeled (I, II, III, IV, V, X, MtDNA) - not sure if this information helps?

As for the Reference :0-270000000 part I have no idea of the context of this number: 270000000? It seems like a very round number? The entire genome of C. elegans is ~100286401bp.

@6br
Copy link
Contributor

6br commented May 2, 2020

Thank you for the feedback!
As you said, we regarded the input as H. sapiens and thus we used 270000000 as the maximum length as temporally, but, actually we can calculate the actual length. I'll fix it. The latter issue needs further investigation because I have not used Delly as an input. I will tell you if I know something.

@6br
Copy link
Contributor

6br commented May 5, 2020

I fixed the truncated reference problem. For the latter data, the gfa file seems to be generated at least. https://github.com/sjackman/gfalint can be used for validation of gfa file.

@moldach
Copy link
Author

moldach commented May 6, 2020

Okay I've remove MoMI-G and pulled in the new version with git clone and then re-ran for Delly.

(base) mtg@mtg-ThinkPad-P53:~/TEST-MOMIG$ sudo bash ~/MoMI-G/scripts/vcf2xg.sh ~/mtg-lab-projects/maddog/delly/delly.vcf  momig-vcf-test-output ~/vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ input=.//momig-vcf-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-vcf-test-output.pcf
+ sort -t, -k 7n .//momig-vcf-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
++ sed 1d .//momig-vcf-test-output.pcf
+ uniq
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-vcf-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-vcf-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/mtg/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-vcf-test-output.pcf.bp.merged.tsv .//momig-vcf-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz
ERROR: Signal 11 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_A71OY2/stacktrace.txt
Please include the stack trace file in your bug report!
error[VPKG::load_one]: Correct input type not found while loading handlegraph::MutablePathDeletableHandleGraph
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

and the stack-trace:

Crash report for vg v1.23.0 "Lavello"
Stack trace (most recent call last):
#8    Object "/home/mtg/vg", at 0x4dd279, in _start
#7    Object "/home/mtg/vg", at 0x1bb8ec8, in __libc_start_main
#6    Object "/home/mtg/vg", at 0x40aed7, in main
#5    Object "/home/mtg/vg", at 0x9e77c7, in vg::subcommand::Subcommand::operator()(int, char**) const
#4    Object "/home/mtg/vg", at 0x9f6a61, in main_view(int, char**)
#3    Object "/home/mtg/vg", at 0xa6c4ab, in vg::get_input_file(std::__cxx11::basic_string<char, std:>
#2    Object "/home/mtg/vg", at 0x9f2ad5, in std::_Function_handler<void (std::istream&), main_view(i>
#1    Object "/home/mtg/vg", at 0xbc103b, in vg::gfa_to_graph(std::istream&, vg::VG*, bool)
#0    Object "/home/mtg/vg", at 0x12e3f20, in stPinchThread_getLast

It's doesn't look like it generated the .gfa file (although there is a .ggf?)

I have the following call sets if you are more familiar with one of them we can try that: CNVnator, DeepVariant, Delly, GRIDSS, Lumpy, Manta, MindTheGap, NGSep, Pindel, Tardis.

@6br
Copy link
Contributor

6br commented May 7, 2020

We defined .ggf as a subset of .gfa, so gfalint <output.ggf works.
Sorry I am not familiar with these SV callers since they are for Illumina.

@moldach
Copy link
Author

moldach commented May 21, 2020

I've posted a follow up in the gfalint to show the output of MoMI-G.
If you'd like a more detailed error message I can try a couple other tools:

"You may wish to try another tool that may give a better error message, such as perhaps Bandage if it's GFA1, or Gfapy for either GFA1 or GFA2."

Which format is this in?

I tried Gfapy just now but ran into an issue

Any idea why this odd output is being produced by MoMI-G?

@6br
Copy link
Contributor

6br commented May 21, 2020

Is it possible to try it again on the latest MoMI-G master?
MoMI-G tools output is compatible with GFA1 format. I'm sorry that I have no idea why gfapy does not work.

@moldach
Copy link
Author

moldach commented May 26, 2020

Okay making some progress now.

I've tried again off of the latest MoMI-G master (git clone). There is no warning issued if the user forgets to install vg (I did). However, I think I've now done that properly.

cd ~/projects/TEST-MOMIG
wget https://github.com/vgteam/vg/releases/download/v1.24.0/vg
chmod +x vg

And test it's working:

(base) mtg@mtg-ThinkPad-P53:~/projects/TEST-MOMIG$ ./vg
vg: variation graph tool, version v1.24.0 "Montieri"

usage: ./vg <command> [options]

main mapping and calling pipeline:
  -- construct     graph construction
  -- index         index graphs or alignments for random access or mapping
  -- map           MEM-based read alignment
  -- augment       augment a graph from an alignment
  -- pack          convert alignments to a compact coverage index
  -- call          call or genotype VCF variants
  -- help          show all subcommands

For more commands, type `vg help`.
For technical support, please visit: https://www.biostars.org/t/vg/

Now I run MoMI-G:

(base) mtg@mtg-ThinkPad-P53:~/projects/TEST-MOMIG$ sudo bash ~/bin/MoMI-G/scripts/vcf2xg.sh ~/projects/maddog/CNVnator/470.vcf momig-test-output ./vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
Traceback (most recent call last):
	3: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	2: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	1: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:90:in `block in <main>'
/home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:90:in `+': no implicit conversion of nil into String (TypeError)
	4: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	3: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	2: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:89:in `block in <main>'
	1: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:92:in `rescue in block in <main>'
/home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:92:in `+': no implicit conversion of Array into String (TypeError)
+ input=.//momig-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-test-output.pcf
+ sort -t, -k 7n .//momig-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ uniq
++ sed 1d .//momig-test-output.pcf
+ sort -k 1,1 -k 2,2n
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-test-output.pcf.bp.merged.tsv .//momig-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz


(base) mtg@mtg-ThinkPad-P53:~/projects/TEST-MOMIG$ ll
total 430272
drwxrwxr-x 4 mtg  mtg       4096 May 26 11:21 ./
drwxrwxr-x 7 mtg  mtg       4096 May 22 10:49 ../
-rw-r--r-- 1 root root     21912 May 26 11:21 breakpoint_list_tmp.tsv
-rwxrwx--- 1 mtg  mtg  101957874 May  1 16:05 c_elegans.PRJNA13758.WS265.genomic.fa*
-rwxrwx--- 1 mtg  mtg        181 May  1 16:01 c_elegans.PRJNA13758.WS265.genomic.fa.fai*
-rw-r----- 1 mtg  mtg   30726020 May  1 16:00 c_elegans.PRJNA13758.WS265.genomic.fa.gz
-rw-r--r-- 1 root root       181 May  1 16:00 c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai
-rw-r--r-- 1 root root     24984 May  1 16:00 c_elegans.PRJNA13758.WS265.genomic.fa.gz.gzi
drwxrwxr-x 6 mtg  mtg       4096 May  6 09:27 gfalint/
-rw-r--r-- 1 root root 100334394 May 26 11:21 momig-test-output.ggf
-rw-r--r-- 1 root root       112 May 26 11:21 momig-test-output.json
-rw-r--r-- 1 root root       113 May 26 11:21 momig-test-output.pcf
-rw-r--r-- 1 root root     21912 May 26 11:21 momig-test-output.pcf.bp.merged.tsv
-rw-r--r-- 1 root root         0 May 26 11:21 momig-test-output.pcf.bp.tsv
-rw-r--r-- 1 root root       113 May 26 11:21 momig-test-output.pcf.output.pcf
-rw-r--r-- 1 root root 104128225 May 26 11:21 momig-test-output.vg
-rw-r--r-- 1 root root  63365361 May 26 11:21 momig-test-output.xg
drwxrwxr-x 3 mtg  mtg       4096 May 20 19:35 src/
-rwxrwxr-x 1 mtg  mtg   39823464 May 11 16:12 vg*

Should I be concerned about the TypeError's?

@moldach
Copy link
Author

moldach commented May 26, 2020

Because the momig-test-output.xg file is there it looks like vg worked. I'm now wondering about the next step.

I need to create a static/ directory in the current working directory and then move the momig-test-output.xg and momig-test-output.pcf into static/ correct?

What about the config.yaml: a configuration file? What is this supposed to look like?

@6br
Copy link
Contributor

6br commented May 27, 2020

Actually there is a line in vcf that is not parsed by momig-tools, and I intend to output error message when it encounters such a line. But it won't work now due to a bug. I suppose the output pcf file might be insufficient. I'll fix it soon.

@6br
Copy link
Contributor

6br commented May 27, 2020

After that, this configuration YAML file is needed to be loaded as the next step.
https://momi-g.readthedocs.io/en/latest/configure_file.html
The instruction is as follows.
https://momi-g.readthedocs.io/en/latest/quick_start.html#how-to-use-your-own-custom-data

  1. git clone from https://github.com/MoMI-G/MoMI-G.
  2. mkdir static on the directory MoMI-G clones.
  3. Put your XG data in the static folder like static/data.xg.
  4. Add configure file on static/config.yaml.
  5. Modify Docker.backend file.
  6. Build backend server and run.

The fastest way is to copy this config file and change the file name of xg in the following config.yaml.
https://github.com/MoMI-G/MoMI-G_backend/blob/master/sample/chm1/config.yaml

@6br 6br mentioned this issue May 27, 2020
@6br
Copy link
Contributor

6br commented May 27, 2020

Is it possible for you to try it on the latest master and tell me the output of MoMI-G script? I suppose the lines starting with Error line: would be output.

@moldach
Copy link
Author

moldach commented May 27, 2020

Actually there is a line in vcf that is not parsed by momig-tools, and I intend to output error message when it encounters such a line. But it won't work now due to a bug. I suppose the output pcf file might be insufficient. I'll fix it soon ...
Is it possible for you to try it on the latest master and tell me the output of MoMI-G script? I suppose the lines starting with Error line: would be output.

Sure I can try that:

(base) mtg@mtg-ThinkPad-P53:~/bin$ rm -rf MoMI-G/
(base) mtg@mtg-ThinkPad-P53:~/bin$ git clone https://github.com/MoMI-G/MoMI-G
Cloning into 'MoMI-G'...
remote: Enumerating objects: 43, done.
remote: Counting objects: 100% (43/43), done.
remote: Compressing objects: 100% (31/31), done.
remote: Total 918 (delta 28), reused 20 (delta 12), pack-reused 875
Receiving objects: 100% (918/918), 5.75 MiB | 1.12 MiB/s, done.
Resolving deltas: 100% (598/598), done.
(base) mtg@mtg-ThinkPad-P53:~/bin$ cd ~/projects/TEST-MOMIG
(base) mtg@mtg-ThinkPad-P53:~/projects/TEST-MOMIG$ sudo bash ~/bin/MoMI-G/scripts/vcf2xg.sh ~/projects/maddog/CNVnator/470.vcf momig-test-output ./vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
[sudo] password for mtg: 
Traceback (most recent call last):
	3: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	2: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	1: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:90:in `block in <main>'
/home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:90:in `+': no implicit conversion of nil into String (TypeError)
	3: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `<main>'
	2: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:10:in `each'
	1: from /home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:89:in `block in <main>'
/home/mtg/bin/MoMI-G/scripts/vcf2gfa/vcf2pcf.rb:92:in `rescue in block in <main>': Error line: I	4282001	470_CNVnator_dup_1		4295000	.	13000	DUP	GT:CN	./1:2 (RuntimeError)
+ input=.//momig-test-output.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//momig-test-output.pcf
+ sort -t, -k 7n .//momig-test-output.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
++ sed 1d .//momig-test-output.pcf
+ uniq
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//momig-test-output.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//momig-test-output.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//momig-test-output.pcf.bp.merged.tsv .//momig-test-output.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz

So this is the error output:

Error line: I 4282001 470_CNVnator_dup_1 4295000 . 13000 DUP GT:CN ./1:2 (RuntimeError)

@moldach
Copy link
Author

moldach commented May 28, 2020

Okay so I now have a config.yaml file that looks like this:

bin:
  vg: "vg"
  vg_tmp: "vg"
  vg_volume_prefix: ""
  graphviz: "dot"
  fa22bit: "faToTwoBit"
  bigbed: "bedToBigBed"
reference:
  chroms: "static/GRCh.json"
  data:
    - name: "ce11"
      features:
        - name: 'gene_annotation'
          url: "static/c_elegans.PRJNA13758.WS265.annotations.gff3"
          chr_prefix: ""
data:
  - name: "maddog"
    desc: "2020-05-27"
    chr_prefix: ""
    ref_id: "ce11"
    source:
      xg: "static/Celegans.xg"
      csv: "static/Celegans.pcf"
      gam: ""
      gamindex: ""
    features: []
    static_files: []

However, I cannot find an example of GRCh.json file that includes the color and length for every chromosome. Could you please share an example so I can change for my species, thank you.

import json
data = {}
data['chromosome'] = []

data['chromosome'].append({
 'name': 'I",
 'length': 15072434,
 'color': '#4477AA'
})
data['chromosome'].append({
 'name': 'II",
 'length': ,15279421
 'color': '#66CCEE'
})

...

@6br
Copy link
Contributor

6br commented May 28, 2020

Thank you for reporting an error. I'll check it.
This is an example of chromosome length and color json file.
https://raw.githubusercontent.com/MoMI-G/MoMI-G_backend/master/sample/chm1/GRCh.json

@6br
Copy link
Contributor

6br commented May 28, 2020

I suspect this line is invalid.
I 4282001 470_CNVnator_dup_1 4295000 . 13000 DUP GT:CN ./1:2

Usually, vcf file is described as a tab-delimited text as #CHROM POS ID REF ALT QUAL FILTER INFO, but the previous line does not have any REF field (because there are two tab characters between 470_CNVnator_dup_1 and 4295000), which makes the script fail. That's why MoMI-G tools cannot handle well in this VCF dialect.
Do you know where is the specification of this VCF variant?

@moldach
Copy link
Author

moldach commented May 28, 2020

I suspect this line is invalid.
I 4282001 470_CNVnator_dup_1 4295000 . 13000 DUP GT:CN ./1:2

Luckily I have a number of variant call sets from a variety of tools so we can tell if this is unique to CNVnator or not:

  • Pindel: Error line: II5008800 . 5008801 . 1 GT:AD 0/0:32,2 (RuntimeError)
  • Delly: Another error (save this for another issue)
  • GRIDSS: Error line: I 643 gridss0b_1b 659.21 0 GT:ASQ:ASRP:ASSR:BANRP:BANRPQ:BANSR:BANSRQ:BAQ:BASRP:BASSR:BQ:BSC:BSCQ:BUM:BUMQ:BVF:CASQ:IC:IQ:QUAL:RASQ:REF:REFPAIR:RP:RPQ:SR:SRQ:VF .:0.00:0:0:0:0.00:0:0.00:0.00:0:0:659.21:0:0.00:26:659.21:26:0.00:0:0.00:0.00:0.00:42:40:0:0.00:0:0.00:0 (RuntimeError)
  • Lumpy: Error line: I 862 555_1 . 0 V:1105109] GT:SU:PE:SR ./.:7:7:0 (RuntimeError)
  • Error line: I 229193 MantaBND:1:0:1:0:0:0:0 . 0 IV:6006459] (RuntimeError)
  • MindTheGap: For the othervariants.vcf file - Error line: I 15759 bkpt1 . 0 GT 1/1 (RuntimeError)
  • NGSep: Error line: I 855 . 27 0 GT:PL:GQ:DP:BSDP:ACN 0/1:252,193,2481:27:64:0,6,58,0:1,1 (RuntimeError)
  • Tardis: Error line: I 230619 vh_del_2 230730 255 112 DEL GT:CNVL:RP:SR 0/1:0.053729:10:6 (RuntimeError)

For all of the VCF files I have I seem to get an error.

Do you know where is the specification of this VCF variant?

No, I cannot find it on their documentation

@6br
Copy link
Contributor

6br commented May 29, 2020

These errors might be caused due to different reasons. MoMI-G tools does not support Illumina SV callers explicitly. Is it possible to normalize each vcf file with https://github.com/fritzsedlazeck/SURVIVOR before running vcf2xg? (e.g. run SURVIVOR merge file_list.txt 0.1 1 1 1 1 50, and file_list.txt just includes the file name of vcf).

@moldach
Copy link
Author

moldach commented May 29, 2020

I was just working with SURVIVOR the other day so it was quick to generate:

From file_list.txt that looks like:

breakdancer.vcf
cnvnator.vcf
deepVariant.vcf
delly.vcf
gridss.vcf
lumpy.vcf
manta.vcf
mindTheGap.vcf
ngsep.vcf
pindel.vcf
tardis.vcf

Merge:

/bin/SURVIVOR-master/Debug/SURVIVOR merge file_list.txt 0.1 1 1 1 1 50 survivor_merged.vcf
merging entries: 59
merging entries: 14
merging entries: 197
merging entries: 1194
merging entries: 2180
merging entries: 762
merging entries: 920
merging entries: 203
merging entries: 0
merging entries: 4917
merging entries: 747

Lot of errors from this:

(base):~/projects/TEST-MOMIG$ sudo bash ~/bin/MoMI-G/scripts/vcf2xg.sh survivor_merged.vcf survivor-concensus ./vg c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ input=.//survivor-concensus.pcf
+ reference=c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ pref=.//survivor-concensus.pcf
+ sort -t, -k 7n .//survivor-concensus.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
++ sed 1d .//survivor-concensus.pcf
+ uniq
+ '[' '!' -s c_elegans.PRJNA13758.WS265.genomic.fa.gz.fai ']'
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/json_to_breakpoint_list.rb .//survivor-concensus.pcf.bp.tsv c_elegans.PRJNA13758.WS265.genomic.fa.gz
+ cat .//survivor-concensus.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname /home/mtg/bin/MoMI-G/scripts/vcf2gfa/pcf2gfa.sh
+ ruby /home/mtg/bin/MoMI-G/scripts/vcf2gfa/gfa_generator.rb .//survivor-concensus.pcf.bp.merged.tsv .//survivor-concensus.pcf.output.pcf c_elegans.PRJNA13758.WS265.genomic.fa.gz
[W::fai_fetch] Reference chrI:0-838 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:0-838
[W::fai_fetch] Reference chrI:839-861 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:839-861
[W::fai_fetch] Reference chrI:862-990 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:862-990
[W::fai_fetch] Reference chrI:991-994 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:991-994
[W::fai_fetch] Reference chrI:995-1128 not found in FASTA file, returning empty sequence
Failed to fetch sequence in chrI:995-1128

Tried using SURVIVOR to just merge two files, delly and pindel, and I get the same error.

@6br
Copy link
Contributor

6br commented May 30, 2020

SURVIVOR seems to add "chr" as a prefix of each reference genome to normalize, but the original reference genome is started with I, so chrI is failed to detect.
Anyway I am looking for the way to normalize VCF file from various kinds of SV callers.

@fritzsedlazeck
Copy link

Please update SURVIVOR.

I am not 100% sure which subcommand of SURIVOR is used, but I assume the merge .
This behavior in merge has been updated a couple of months ago.

Thanks
Fritz

@Boer223
Copy link

Boer223 commented Jul 14, 2020

I also have similar error, like this:

+ input=.//genomes.23.pcf
+ reference=../westar.fa
+ pref=.//genomes.23.pcf
+ sort -t, -k 7n .//genomes.23.pcf
+ sed -e 's/[ ]*//g'
+ awk -F '[,:]' '{print $1,"\t",$2,"\n",$4,"\t",$5}' /dev/fd/63
+ sort -k 1,1 -k 2,2n
++ sed 1d .//genomes.23.pcf
+ uniq
+ '[' '!' -s ../westar.fa.fai ']'
++ dirname ./scripts/vcf2gfa/pcf2gfa.sh
+ ruby ./scripts/vcf2gfa/json_to_breakpoint_list.rb .//genomes.23.pcf.bp.tsv ../westar.fa
+ cat .//genomes.23.pcf.bp.tsv breakpoint_list_tmp.tsv
+ sort -k 1,1 -k 2,2n
+ uniq
++ dirname ./scripts/vcf2gfa/pcf2gfa.sh
+ ruby ./scripts/vcf2gfa/gfa_generator.rb .//genomes.23.pcf.bp.merged.tsv .//genomes.23.pcf.output.pcf ../westar.fa
./scripts/vcf2gfa/gfa_generator.rb:42: warning: Insecure world writable dir /public in PATH, mode 040777
[W::fai_get_val] Reference :0- not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in :0-
error:[vg view] Input GFA is not acceptable.
error:[gfa_to_handle_graph] Found edge record with missing source name
error[VPKG::load_one]: Correct input type not found while loading handlegraph::MutablePathDeletableHandleGraph
error[VPKG::load_one]: Correct input type not found while loading handlegraph::HandleGraph

And my vcf files were merged with bcftools merge.

@6br
Copy link
Contributor

6br commented Jul 14, 2020

Thank you for reporting. I assume some records in the vcf file are not supported in vcf2vg. I updated the way to report errors on ./scripts/vcf2gfa/gfa_generator.rb.

@moldach moldach changed the title Error running vcf2xg.sh on non-model organism Error running vcf2xg.sh on c. elegans Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants