Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The GTDB-Tk reference data does not exist or is corrupted. #588

Open
vichy123 opened this issue May 9, 2024 · 4 comments
Open

The GTDB-Tk reference data does not exist or is corrupted. #588

vichy123 opened this issue May 9, 2024 · 4 comments

Comments

@vichy123
Copy link

vichy123 commented May 9, 2024

hello,I have installed GTDB-Tk v2.3.2 and downloaded the datasets release214 ,also have set GTDBTK_DATA_PATH ,but it also displayed that"The GTDB-Tk reference data does not exist or is corrupted. "

gtdbtk check_install
[2024-05-09 14:57:20] INFO: GTDB-Tk v2.3.2
[2024-05-09 14:57:20] INFO: gtdbtk check_install

================================================================================
ERROR


      The 'GTDBTK_DATA_PATH' environment variable is not defined.           

        Please set this variable to your reference data package.            
       https://ecogenomics.github.io/GTDBTk/installing/index.html           

================================================================================
[2024-05-09 14:57:20] ERROR: Controlled exit resulting from early termination.
(metachip2env) adruan@adruan-Precision-7820-Tower:$ conda env config vars set GTDBTK_DATA_PATH="/newdisk/metabolic/db/release214"
To make your changes take effect please reactivate your environment

Vichy:
gtdbtk check_install
[2024-05-09 21:09:06] INFO: GTDB-Tk v2.3.2
[2024-05-09 21:09:06] INFO: gtdbtk check_install

================================================================================
ERROR


       The GTDB-Tk reference data does not exist or is corrupted.           
           GTDBTK_DATA_PATH=~/newdisk/metabolic/db/release214               

Please compare the checksum to those provided in the download repository.
https://github.com/Ecogenomics/GTDBTk#gtdb-tk-reference-data

[2024-05-09 21:09:06] ERROR: Controlled exit resulting from early termination.

@pchaumeil
Copy link
Collaborator

Hello,
Are you storing the GTDB-Tk database in your home directory?

If you use the tilt key (~) during the export of your environment variable do not use double quotes
GTDBTK_DATA_PATH=~/newdisk/metabolic/db/release214

But I would recommend exporting the full path to the database:
GTDBTK_DATA_PATH=/path/to/your/homedirectory/newdisk/metabolic/db/release214

Regards,
Pierre

@vichy123
Copy link
Author

vichy123 commented May 10, 2024 via email

@Arkadiy-Garber
Copy link

Arkadiy-Garber commented May 17, 2024

I am getting the same error:

(base) ark@TheBelly:~/bin/GTDBTk/data$ gtdbtk 

================================================================================
                                     ERROR                                      
________________________________________________________________________________

           The GTDB-Tk reference data does not exist or is corrupted.           
                   GTDBTK_DATA_PATH=/home/ark/bin/GTDBTk/data                   

   Please compare the checksum to those provided in the download repository.    
          https://github.com/Ecogenomics/GTDBTk#gtdb-tk-reference-data          
================================================================================



(base) ark@TheBelly:~/bin/GTDBTk/data$ ls
ar53_marker_genes_all_r220  ar53.tree.gz                  citations.dmp          gtdb_vs_ncbi_archaea.xlsx   METHODS.txt                 sp_clusters.tsv
ar53_metadata.tsv.gz        bac120_marker_genes_all_r220  delnodes.dmp           gtdb_vs_ncbi_bacteria.xlsx  names.dmp                   synonyms.ar53.tsv
ar53_msa_marker_info.tsv    bac120_metadata.tsv.gz        division.dmp           hq_mimag_genomes.tsv        ncbi_vs_gtdb_archaea.xlsx   synonyms.bac120.tsv
ar53_msa_mask.txt           bac120_msa_marker_info.tsv    FILE_DESCRIPTIONS.txt  images.dmp                  ncbi_vs_gtdb_bacteria.xlsx  VERSION.txt
ar53.sp_labels.tree         bac120_msa_mask.txt           gc.prt                 individual                  nodes.dmp
ar53_taxonomy.tsv           bac120.sp_labels.tree         gencode.dmp            MD5SUM.txt                  qc_failed.tsv
ar53_taxonomy.tsv.gz        bac120_taxonomy.tsv           gtdb.dic               merged.dmp                  readme.txt
ar53.tree                   bac120.tree                   gtdbtk_package         metadata_field_desc.tsv     RELEASE_NOTES.txt

Is there something missing from this directory?

@pchaumeil
Copy link
Collaborator

Hello,
Please read the documentation about downloading the GTDB-Tk package:

https://ecogenomics.github.io/GTDBTk/installing/index.html#gtdb-tk-reference-data

You do not need to download all files from the GTDB website, only the GTDB-Tk archive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants