Skip to content

metagenome-atlas/Tutorial

Repository files navigation

Metagenome-Atlas Tutorial

This is a tutorial for Metagenome-Atlas. Metagenome-Atlas is an easy-to-use pipeline for analyzing metagenomic data. It handles all steps from QC, Assembly, Binning, to Annotation.

⁉️ If you have any question or errors write us.

checkmquality

Setup

Got to the setup page and follow the instructions.

Analyze the output of Atlas

Usually before starting to install a program I want to make sure that it gives the output I want. Therefore, we start analyzing the output of Metagenome-atlas.

We prepared an interactive Rmarkdown with the code for differential analysis.

✨ Follow this link to the interactive tutorial.

Here is an other Tutorial based on human samples with only the reports

Install and run atlas with three commands

In this part of the tutorial you will install metagenome-atlas either in GitHub codespaces or on your server and test it with a small dataset. As real metagenomic assembly can take more than 250GB ram and multiple processors, you would ideally do this directly on a high-performance system, e.g. the cluster of your university. You can install minconda in your home directory if it is not installed on your system.

Follow this link

See also the get started section in the documentation.

Use this code for your project

First, clone this git repository.

Copy atlas files to your local machine.

I made some handy scripts to copy the most important atlas output files from a server to your local machine. As the output files might change between different versions of atlas I use the file atlas_output_files.yaml to specify them. Check with atlas version is the closest to the atlas version you used.

You can run get_atlas_files.py or get_atlas_files.R to do this.

The Python script asks for the following information and stores them in .connection_details.yaml.

    "output_dir": 'atlas_data',
    "atlas_version": "v2.17",
    "username": "me",
    "server": "myserver.server.com",
    "base_path_server": '/home/user/my_atlas_run',
    "private_key_path": None # "C:/Users/User/.ssh/id_rsa"

For the R script you need to hard code them into the script.

⚠️ Some output atlas files might be very large, e.g. the gene catalog.

Use files specified in the atlas_output_files.yaml

This might be a complicated but generic way to access the atlas files. You can also simply copy the path specified in the atlas_output_files.yaml

In R you can use

data_dir <- "atlas_data" # path specified as output_dir in the get_atlas_files script
atlas_version <- "v2.17"
file_config_files <- "../atlas_output_files.yaml"

files <- yaml::yaml.load_file(file_config_files)[[atlas_version]]

for (key1 in names(files)) {
  value1 <- files[[key1]]
  if (is.character(value1)) {
    # It's a direct path
    files[[key1]] <- file.path(data_dir, value1)
  } else if (is.list(value1)) {
    # It's a nested list, go deeper
    for (key2 in names(value1)) {
      value2 <- value1[[key2]]
      files[[key1]][[key2]] <- file.path(data_dir, value2)
    }
  }
}


taxonomy_file <- files[["genomes"]][["taxonomy"]]
tree_file <- files[["genomes"]][["tree_bacteria"]]