README.Rmd

---
output:
  github_document:
    html_preview: TRUE
editor_options: 
  chunk_output_type: console
---

<!-- README.md is auto-generated from README.Rmd. Do not edit README.md. -->

```{r, echo = FALSE, message = FALSE, error = FALSE, warning = FALSE}
library(tidyverse)
```
# neurips2021_multimodal_topmethods

This repository is a collection of top methods submitted to the OpenProblems / NeurIPS 2021 competition for multimodal single-cell data integration ([link](https://openproblems.bio/neurips_2021/)).

Through the pipelines and code contained in this repository, you should be able to replicate the obtained scores for each of the top submissions.

### List of methods

```{r readyaml, echo = FALSE, warning = FALSE, error = FALSE, message = FALSE, results='asis'}
# get viash to generate a yaml of all components
out <- processx::run(
  command = "bin/viash", args = c("ns", "list", "-p", "docker"),
)
comp_yamls <- yaml::read_yaml(text = out$stdout)
```

```{r yamlasdf, echo = FALSE, warning = FALSE, error = FALSE, message = FALSE, results='asis'}
# format relevant information as a data frame
comp_df <- map_df(seq_along(comp_yamls), function(i) {
  ya <- comp_yamls[[i]]
  auths <- ya$functionality$authors
  authors <- map_df(auths, function(aut) {
    tibble(
      name = aut$name,
      email = aut$email,
      roles = list(aut$roles),
      orcid = ifelse(is.null(aut$props$orcid), "", paste0(" <a href='https://orcid.org/", aut$props$orcid, "'><img src='resources/orcid_id.svg' height='16'></a>")),
      github = ifelse(is.null(aut$props$github), "", paste0(" <a href='https://github.com/", aut$props$github, "'><img src='resources/github_mark.svg' height='16'></a>")),
      url = ifelse(is.null(aut$props$url), name, paste0("<a href='", aut$props$url, "'>", name, "</a>")), 
      md = paste0(url, orcid, github)
    )
  })
  
  tibble(
    conf = ya$info$config,
    name = ya$functionality$name,
    namespace = ya$functionality$namespace %||% "",
    method_label = ya$functionality$info$method_label %||% "",
    submission_id = ya$functionality$info$submission_id %||% "",
    team_name = ya$functionality$info$team_name %||% "",
    project_url = ya$functionality$info$project_url %||% "",
    publication_doi = ya$functionality$info$publication_doi %||% "",
    publication_url = ya$functionality$info$publication_url %||% "",
    authors_text = paste0(authors$md, collapse = ", "),
    authors = list(authors %>% select(-md)),
    dir = gsub("/[^/]+/[^/]+$", "", conf)
  )
})

```

```{r echo = FALSE, warning = FALSE, error = FALSE, message = FALSE, results='asis'}
# combine info final markdown table
comp_df %>% 
  filter(!grepl("_train$", name)) %>%
  arrange(namespace, name) %>% 
  transmute(
    Task = gsub("_methods", "", namespace),
    Team = team_name,
    Method = glue::glue("[{method_label}]({dir})"),
    Authors = authors_text
  ) %>% 
  knitr::kable()
```


## Dependencies

The dependencies of this repository are the same as those of the competition itself ([link](https://openproblems.bio/neurips_docs/submission/quickstart/#2-configure-your-local-environment)).

To run any of the methods, you need to download the required binaries and datasets and build the relevant docker containers first.
```
# download viash and nextflow
bin/init

# sync datasets to local
src/sync_datasets.sh

# build components and docker containers
bin/viash_build --max_threads 4
```

You can then rerun components by running the bash script located in the respective folders.

For example:

```bash
$ src/predict_modality/methods/cajal/test.sh

...

N E X T F L O W  ~  version 21.04.1
Pulling openproblems-bio/neurips2021_multimodal_viash ...
Launching `openproblems-bio/neurips2021_multimodal_viash` [trusting_woese] - revision: a28e0c22c5 [1.4.0]
executor >  local (5)
[4e/f4b91d] process > get_id_predictions (4)                                                            [100%] 4 of 4, cached: 3 ✔
[94/ffca9e] process > get_id_solutions (2)                                                              [100%] 4 of 4, cached: 4 ✔
[b3/92d144] process > bind_tsv_rows:bind_tsv_rows_process (meta_metric)                                 [100%] 1 of 1, cached: 1 ✔
[44/667c85] process > mse:mse_process (openproblems_bmmc_cite_phase2_PM_gex2adt)                        [100%] 4 of 4, cached: 3 ✔
[4f/8061cc] process > correlation:correlation_process (openproblems_bmmc_cite_phase2_PM_gex2adt)        [100%] 4 of 4, cached: 3 ✔
[ea/988a0f] process > check_format:check_format_process (openproblems_bmmc_cite_phase2_PM_gex2adt)      [100%] 4 of 4, cached: 3 ✔
[dc/393dca] process > final_scores:final_scores_process (output)                                        [100%] 1 of 1 ✔

[{"method_id":"cajal","ADT2GEX":0.3273,"ATAC2GEX":0.2172,"GEX2ADT":0.4613,"GEX2ATAC":0.178,"Overall":0.2959}]
```

After the bash script has finished running, output will have been stored in:

* `output/pretrain/<task-id>/<method-id>/<dataset-id>.output_pretrain`: Pre-trained models (if required).
* `output/predictions/<task-id>/<method-id>/<dataset-id>`: Predictions made by the method.
* `output/evaluation<method-id>/<dataset-id>`: Evaluation metrics.

Where <task-id> is one of three competition tasks (`predict_modality`, `match_modality` or `joint_embedding`).