Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] anvi-run-kegg-kofams terminated prematurely #2256

Open
lulux719 opened this issue Apr 17, 2024 · 1 comment
Open

[BUG] anvi-run-kegg-kofams terminated prematurely #2256

lulux719 opened this issue Apr 17, 2024 · 1 comment

Comments

@lulux719
Copy link

Short description of the problem

anvi-run-kegg-kofams terminated prematurely after producing the hmm.table file

anvi'o version

Anvi'o .......................................: marie (v8)
Python .......................................: 3.10.13
Profile database .............................: 38
Contigs database .............................: 21
Pan database .................................: 16
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

Installed via conda

Detailed description of the issue

I have tried several times, and anvi-run-kegg-kofams always terminated prematurely after producing the hmm.table file with the below message.

Done with KOfam 🎊

Number of raw hits in table file .............: 397,424,337
Terminated

Here's the hmm.table generated.

head hmm.table
11548766 - K24524 - 0.0034 19.5 0.0 0.0055 18.8 0.0 1.3 1 0 0 1 1 1 1 -
11690881 - K15921 - 4.7e-199 667.4 13.8 5.2e-199 667.3 13.8 1.0 1 0 0 1 1 1 1 -
11605040 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 -
11656118 - K15921 - 1.8e-190 639.0 21.8 2.4e-190 638.6 21.8 1.1 1 0 0 1 1 1 1 -

Here's the info of contigs.db.
anvi-db-info 03_CONTIGS/contigs.db

DB Info (no touch)

Database Path ................................: 03_CONTIGS/contigs.db
description ..................................: [Not found, but it's OK]
db_type ......................................: contigs (variant: unknown)
version ......................................: 21

DB Info (no touch also)

project_name .................................: ob
contigs_db_hash ..............................: hashc9e5c18c
split_length .................................: 20000
kmer_size ....................................: 4
num_contigs ..................................: 1193879
total_length .................................: 16702353431
num_splits ...................................: 1450761
genes_are_called .............................: 1
external_gene_calls ..........................: 0
external_gene_amino_acid_seqs ................: 0
skip_predict_frame ...........................: 0
splits_consider_gene_calls ...................: 1
scg_taxonomy_was_run .........................: 0
scg_taxonomy_database_version ................: None
trna_taxonomy_was_run ........................: 0
trna_taxonomy_database_version ...............: None
creation_date ................................: 1706651817.91107
gene_function_sources ........................: Pfam
gene_level_taxonomy_source ...................: kaiju

  • Please remember that it is never a good idea to change these values. But in some
    cases it may be absolutely necessary to update something here, and a
    programmer may ask you to run this program and do it. But even then, you
    should be extremely careful.

AVAILABLE GENE CALLERS

  • 'prodigal' (16,415,905 gene calls)
  • 'Ribosomal_RNA_28S' (11 gene calls)
  • 'Ribosomal_RNA_23S' (3,430 gene calls)
  • 'Ribosomal_RNA_18S' (13 gene calls)
  • 'Ribosomal_RNA_16S' (1,878 gene calls)

AVAILABLE FUNCTIONAL ANNOTATION SOURCES

  • Pfam (25,159,321 annotations)

AVAILABLE HMM SOURCES

  • 'Archaea_76' (76 models with 171,317 hits)
  • 'Bacteria_71' (71 models with 331,894 hits)
  • 'Protista_83' (83 models with 19,854 hits)
  • 'Ribosomal_RNA_12S' (1 model with 0 hits)
  • 'Ribosomal_RNA_16S' (3 models with 1,878 hits)
  • 'Ribosomal_RNA_18S' (1 model with 13 hits)
  • 'Ribosomal_RNA_23S' (2 models with 3,430 hits)
  • 'Ribosomal_RNA_28S' (1 model with 11 hits)
  • 'Ribosomal_RNA_5S' (5 models with 0 hits)

When I try it on a smaller contigs.db (1/4 of the samples), it completed without any problem. So I'm guessing there's something with server capacity. My question would be, are there any ways to bypass this issue? I assume the program finished the "Run an HMM search against KOfam" step. Is it possible to resume the program from here?

Thank you very much.

@meren
Copy link
Member

meren commented Apr 17, 2024

This looks like a memory issue, so there is not much we can do. BUT, there is always a way. In this case, I one could split their contigs-db file into 10 different ones using a collection-txt and anvi-split, then run anvi-run-kegg-kofams on each one of them separately, and then export the contents of gene_functions table from each one of them, and then manually import the final hits into the original contigs-db.

But this is a hacker's workaround, and a machine with a larger memory would have been the most optimal solution of course :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants