NCBI Datasets

NCBI Datasets is a resource that lets you easily gather data from across NCBI databases. You can use it to find and download sequence, annotation, and metadata for genes and genomes using our command-line interface (CLI) tools or NCBI Datasets web interface.

NCBI Datasets tools are under active development. To submit feedback, please create a GitHub issue or contact NCBI directly with your questions, comments or feature requests.

⚠️ The NCBI Datasets command-line tools (CLI) v13.x and older, as well as the API v1, will be deprecated in June 2024 and then retired in December 2024. Please download and install the latest version using the instructions below.

Install the Datasets command-line tools

Install the latest version (CLI v16.x) of the NCBI Datasets CLI tools, datasets and dataformat, using conda:

conda install -c conda-forge ncbi-datasets-cli

For other installation options, see our CLI tools download and install instructions.

Use the Datasets command-line tools

Use datasets to download biological sequence data across all domains of life from NCBI.

Use dataformat to convert metadata included as part of the data package from JSON Lines format to other formats.

Examples:

Use datasets to download a genome data package for the human reference genome GRCh38:

datasets download genome taxon human --reference --filename human-reference.zip

Use dataformat to extract selected fields of metadata from the downloaded data package for the human reference genome, GRCh38:

dataformat tsv genome --package human-reference.zip --fields organism-name,assminfo-name,accession,assminfo-submitter
Organism name	Assembly Name	Assembly Accession	Assembly Submitter
Homo sapiens	GRCh38.p14	GCF_000001405.40	Genome Reference Consortium

The Datasets CLI schematic below also outlines the available commands for the datasets CLI.

Download large numbers of genomes

Download large numbers of genomes by first downloading a dehydrated zip archive and then accessing the data in three steps.

Download the dehydrated zip archive
Unzip the downloaded zip archive
Rehydrate to access the data

Try this example for the human reference genome:

Download the dehydrated zip archive:
datasets download genome accession GCF_000001405.40 --dehydrated --filename human_GRCh38_dataset.zip
Unzip the downloaded zip archive:
unzip human_GRCh38_dataset.zip -d my_human_dataset
Rehydrate to access the data:
datasets rehydrate --directory my_human_dataset/

For more information, see how to download large genome data packages.

Datasets data packages

NCBI Datasets provides sequence, annotation, metadata and other biological data as NCBI Datasets Data Package zip archives.

We currently offer four types of data package:

An NCBI Datasets Gene Data Package
An NCBI Datasets Genome Data Package
A specialized NCBI Datasets Virus Data Package.
An NCBI Datasets Taxonomy Data Package

Datasets data reports

NCBI Datasets data packages include data report files that contain metadata about the requested records. Data report schemas describe each type of data report, including available fields, with descriptions and examples.

Name		Name	Last commit message	Last commit date
Latest commit History 558 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
training		training
LICENSE.md		LICENSE.md
README.md		README.md
datasets.openapi.yaml		datasets.openapi.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/ISSUE_TEMPLATE

.github/ISSUE_TEMPLATE

training

training

LICENSE.md

LICENSE.md

README.md

README.md

datasets.openapi.yaml

datasets.openapi.yaml

Repository files navigation

NCBI Datasets

Install the Datasets command-line tools

Use the Datasets command-line tools

Examples:

Download large numbers of genomes

Datasets data packages

Datasets data reports

About

Releases 154

Packages

Contributors 8

Languages

License

ncbi/datasets

Folders and files

Latest commit

History

Repository files navigation

NCBI Datasets

Install the Datasets command-line tools

Use the Datasets command-line tools

Examples:

Download large numbers of genomes

Datasets data packages

Datasets data reports

About

Topics

Resources

License

Stars

Watchers

Forks

Languages