Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError contig_4488 #42

Open
ZarulHanifah opened this issue Mar 26, 2024 · 0 comments
Open

KeyError contig_4488 #42

ZarulHanifah opened this issue Mar 26, 2024 · 0 comments

Comments

@ZarulHanifah
Copy link

Hello Vini,

I got a have been using MetaCoAG for a while, works well most of the time until I got a KeyError: contig_4488. The dataset Ive been working on is ONT, assembled on metaFlye.

This contig_4488 is not present in my flye assembly. An edge_4488 was present in the graph assembly though (Could this be the issue?).

grep -w "contig_4488\|edge_4488" /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag 
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:S	edge_4488	GGCATGACGCCCAGTACCACCACGTACGGGACAGGCATCAATAGCAACACGGGCCTCGGCGCTACTAACAATGGCAATGCCGGCACGACACCCGGCACCGGCGTCTCCGGGGCCGGCAGCAGCGGCGCGACGATGGGCACCAGCGGCACCACAGGCCTCGGCAGTACCTACAATGGCACCACCGGTACGACGCTCGGCACCGGCACCGGTACAACCGGCGTCGGCGCCAATGGCCTCGGCACCGGCGGCGCCACGGGCCTCGGCGGCACCGACAACGGCGCCACCGGCGCGACGCCGGGCACCGGCGGCACTGGAGCGGGGACCGGCGGCACTGGCGGTACTGGCGGCCGGTAAGGCACCCGGAGTACGCCGCTAACGGCGACGGGCGGGGCGAGGAGGCGTCACCCCGTCCGTCGGCGCGCCCGCGCCGGAAAGCTGACCCGTTCCTCGATGCCGGCCGGGTCCTCGTGAATGATGATCTCGGCGTGCGGAAAGGCGCGCTGCAGCTGCGCCTCGACCGCGTCGGAAATCTGGTGCGCGCGCGACAGGCTCATCGCGCCGTCCATCTCGATATGCAGCTGAATAAACGCGGTCGGCCCGGCGATGCGGGTGCGGATGTCATGCACCGCGGTGACTTCGGGATGGCTTTCGGCGATCGCGCGGACCCGGGCGCGCTCCGAATCGGGCAATTCGCGGTCCATCAGCTGGGTCAGCGACAATCGCGCGATCTTGAATGCCCCGCGGATGAGCCACAGCCCGACCGCAGCGCCGAACAGCGGGTCGAGCAGCGGCATCGGAAAGGAGCTGCCGATCGCCAGCGTCGCGATGACGCCGAGGTTCAGGATCAGGTCGCCGCGATAGTGCAATTCATCGGCGCCGATCGCCAACGAGCCGGTGCGTTTGACGACGTAGCGCTGGTAGAGAACCAGGCCGAGCGTCATGGCGATCGCCACCAGCATGACCGCGATCCCCGCCGGCGGGTGCGCCACCGGGCGCGGCTCGGCCAGGCGGCGGATCGCCTCGAACATCAACAAGGCAGCGCTGCCGACGAGAAAGGCGGACTGGGCGAGCGCCGCCAACGGCTCGGCCTTGCCGTGGCCGAAGCGGTGCTGGCGGTCGGGCGGCGTCGCGGCGCGCCGCACGGCGAACAGATTGACCAGCGAGGCGACGGCATCGACCAGCGAATCGACGAGGCTCGACAACAGGGCGACCGAGCCGGTGCCGATCCAGGCGGCGAGCTTGGCGACAATCAGCACCGTCGCGATCGCCAGCGAGGCGGCGGTCGCGCGCCGCCGCAGCATCTGCGCGGCGCCGCGCTCGCTCGTTACCTCGCTCACGGATAGAGGCGCTGTTTGCGCCATCCCTCGCCGTCGCGGACGAACGCCACGCGGTCGTGCAGACGGAACGGCCGCTCCTGCCAAAACTCGACGCTGTCCGGCCATATCCGAAAACCCGACCAGTAGGCGGGTCGCGGCACGGCGGGTTGCTCGGCATAGCGCTGCGAGTACAGCGCGAAGCGGCGCTCCAGCTCGGCGCGCTCGGCGAGCGGGCGCGACTGGTCGGAGGCCCAGGCGCCGATCTGGCTGTCGCGCGGCCGGGTCGCGAAATAGGCGTCGGCCTCGGCCGGCGAGACCGCTCTCGCCTCGCCCTCGATGCGCACCTGGCGGGCCAGCGACTTCCAGTAGAGGCACAGCGCGGCCCGCGGATTGGCCGCCAGCTCCGCGCCCTTGCGGCTGTCGAGATTGGTGTAAAACACGAAGCCGCGCTGGTCGGCGCCCTTGAGCAGCACCGCGCGCAACGACGGCCGCCCGTCCGCTGTCGCGGTCGCCAGCATCGTCGCCTCGGGGATCGGCTCGCACTGCGCGGCCAGCGCGAACCAGCGCGCGAACGGCGCGAACGGTTCGTTCTCGGCGATCTCGTCGGTCATTGCGTGAGGTGGCTCCGCTTTGGTTGTGCGCGCCGGAGCCTTCCCTACTCCGCCCCGCGATCCTCGGCAACCGCCCTGCTCGACACGATCGCGGCCGCCGGCGCCGAAGAAGGGCCGCGGCCGCGGATCTCCGCCAGCAGCGCCGCCAAGGTCACTCGCATCGCCGCCGCCTCGGCCTTGACGATCCGCTCCATCGCCGGCGCGACCTGGCGCTGCCACGACGCCAGCGGCCGCGCCAGCCAGCTGCCGGCGAGCGGCAGACCGAGCGCCAGATAAAGATCGTGCAGCGTCGCGCTATGCGGGTCCCAGGCGAGCACCCAGGCGCCGTCCTGGGTCGGCGCGGTGAACCCGGCCTCGGCGAGGATCTGCAGATGCTCGTCGGCGACCGAGGTCGGCACGCCGAGTTCGCTCGCCAGCATCGCGGTGCGGCAGCGCAGGCCGTGCTGCTGCGCCCGCGCCAGCGCGGCAATCAGCGCCAGCGCGAAACCGAGCCTCACGCCGCCGCTGCTCAGATGCGACAATCGCTCATCGACCCGCCAGGTCGGCAGGTTGGCGGCGACCACGGCGCCGAGCAATACCGCATTCCAGGTGACGTACATCCACAACAGAAAGATCGGGATCGCCGCGAGCGCGCCATAGACGGTCTGATAGAACGACGAGGCGGCGATGTAGATGGAAAATCCAACCTTCAGGATCTCGATGGCGGCCGCGGCGACCGCGGCGCCGAGGAGGCCGTCGCGCCAGCGCACCGCACAATTCGGAATGAGGCAATAGAGCAGTGTGCAGGCGATCAACTCCAACACGAACGGGACAAGGCGCGCGACGACATGCGGCCAGCCGCTCGTCAGCTCCGTCACCAGCGCCGGGTTGAGGCCGGCATGGCGGGCCGCCGTGTCGAGATAGGTCGACAGGGTCAGGCTCATGCCGACCAGCAGCGGGCCCAACGTGATCAGCGTCCAATAGGCGAGCACCCGCTGCACCCAGGGCCGCGGCGTCGTGACCCGCCACAGCGCATTGAGGCGGTCCTCGACCGTAACCAGCAGCAGGACGCCGGTGGCGGCGATGCCGACGAGACCGATCGCGGTCGCCTGCGCCGCCGAACCGGCGAAATACTGGAACCACTGCGCCGCCTGCTCGCTGATCGCCGGCACGAAATTACGAAACAACAGCGCCGGCAGGTCCTGCCGCGCCGGCGCGAAACTCGGGAAGACCGACAGGACGCCGAGCCCGACGACGCCAAGCGGCACCAGCGACACCAGGGTCGTGTAGCTGAGCGCGCCCGAGGCGGCAAAGCAGCCGTCATGGTTGAACCGGTGCAGCGCATAGCGGCAGAAGGTCAGCACCGCCCTGAGCCGGCGGCGCAGCACGCCGTGGCCAGAGTCTCGGCGGCTGAACTTGGCGCGGCCGGGCGACGGAGGACCGCGATGTCG	dp:i:32
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4485	-	edge_4488	+	0M	RC:i:7
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4487	-	edge_4488	+	0M	RC:i:5
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	+	edge_265255	+	0M	RC:i:7
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	+	edge_265254	+	0M	RC:i:16
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	-	edge_265252	-	0M	RC:i:2
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	-	edge_100100	-	0M	RC:i:13
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:L	edge_4488	-	edge_265253	+	0M	RC:i:40
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_4485	edge_265255-,edge_4488-,edge_4485+,edge_8112-	*
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_4487	edge_265254-,edge_4488-,edge_4487+	*
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_68975	edge_4488-,edge_265253+,edge_24317-,edge_68975+	*
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_100100	edge_277711+,edge_100100+,edge_4488+,edge_265254+	*
/fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa:P	contig_265252	edge_4474-,edge_265252+,edge_4488+	*

As you can see, "contig_4488" is supposedly not present in any of the input files given to MetaCoAG.

The command executed:

metacoag --assembler flye \
    --graph /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa \
    --contigs /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta \
    --paths /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt \
    --abundance /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv \
    --output $outdir &> /fs03/jm41/Zarul/C002_D1_results/log/metacoag_medaka/log.log

Here is the error message:

2024-03-27 02:39:34,410 - INFO - Welcome to MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs.
2024-03-27 02:39:34,429 - INFO - Input arguments: 
2024-03-27 02:39:34,430 - INFO - Assembler used: flye
2024-03-27 02:39:34,430 - INFO - Contigs file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly.fasta
2024-03-27 02:39:34,430 - INFO - Assembly graph file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly_graph.gfa
2024-03-27 02:39:34,430 - INFO - Contig paths file: /fs03/jm41/Zarul/C002_D1_results/flye/assembly_info.txt
2024-03-27 02:39:34,430 - INFO - Abundance file: /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag/coverm_abundance.tsv
2024-03-27 02:39:34,430 - INFO - Final binning output file: /fs03/jm41/Zarul/C002_D1_results/binning_medaka/metacoag
2024-03-27 02:39:34,430 - INFO - Marker gene file hmm: auxiliary/marker.hmm
2024-03-27 02:39:34,430 - INFO - Minimum length of contigs to consider: 1000
2024-03-27 02:39:34,430 - INFO - Depth to consider for label propagation: 10
2024-03-27 02:39:34,431 - INFO - p_intra: 0.1
2024-03-27 02:39:34,431 - INFO - p_inter: 0.01
2024-03-27 02:39:34,431 - INFO - Do not use --cut_tc: False
2024-03-27 02:39:34,431 - INFO - mg_threshold: 0.5
2024-03-27 02:39:34,431 - INFO - bin_mg_threshold: 0.33333
2024-03-27 02:39:34,431 - INFO - min_bin_size: 200000 base pairs
2024-03-27 02:39:34,431 - INFO - d_limit: 20
2024-03-27 02:39:34,431 - INFO - Number of threads: 8
2024-03-27 02:39:34,431 - INFO - MetaCoAG started
2024-03-27 02:39:53,232 - INFO - Total number of contigs available: 269678
2024-03-27 02:39:58,801 - INFO - Total number of edges in the assembly graph: 77552
2024-03-27 02:39:58,928 - INFO - Total isolated contigs in the assembly graph: 244283
2024-03-27 02:39:58,929 - INFO - Obtaining lengths and coverage values of contigs
2024-03-27 02:40:18,190 - INFO - Total long contigs: 267613
2024-03-27 02:40:18,190 - INFO - Total isolated long contigs in the assembly graph: 243244
2024-03-27 02:40:18,191 - INFO - Obtaining tetranucleotide frequencies of contigs
2024-03-27 02:47:08,567 - INFO - Scanning for single-copy marker genes
2024-03-27 02:47:08,636 - INFO - .hmmout file already exists
2024-03-27 02:47:08,636 - INFO - Obtaining contigs with single-copy marker genes
Traceback (most recent call last):
  File "/home/mzar0002/miniconda3/envs/metacoag_/bin/metacoag", line 1260, in <module>
    main()
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mzar0002/miniconda3/envs/metacoag_/bin/metacoag", line 613, in main
    ) = marker_gene_utils.get_contigs_with_marker_genes(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fs03/jm41/Zarul/envs/metacoag/lib/python3.12/site-packages/metacoag_utils/marker_gene_utils.py", line 147, in get_contigs_with_marker_genes
    contig_num = contig_names_rev[contig_name]
                 ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'contig_4488'

Thank you 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant