-
Notifications
You must be signed in to change notification settings - Fork 20
/
README.Rmd
126 lines (101 loc) · 5.17 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
output:
github_document:
html_preview: TRUE
editor_options:
chunk_output_type: console
---
<!-- README.md is auto-generated from README.Rmd. Do not edit README.md. -->
```{r, echo = FALSE, message = FALSE, error = FALSE, warning = FALSE}
library(tidyverse)
```
# neurips2021_multimodal_topmethods
This repository is a collection of top methods submitted to the OpenProblems / NeurIPS 2021 competition for multimodal single-cell data integration ([link](https://openproblems.bio/neurips_2021/)).
Through the pipelines and code contained in this repository, you should be able to replicate the obtained scores for each of the top submissions.
### List of methods
```{r readyaml, echo = FALSE, warning = FALSE, error = FALSE, message = FALSE, results='asis'}
# get viash to generate a yaml of all components
out <- processx::run(
command = "bin/viash", args = c("ns", "list", "-p", "docker"),
)
comp_yamls <- yaml::read_yaml(text = out$stdout)
```
```{r yamlasdf, echo = FALSE, warning = FALSE, error = FALSE, message = FALSE, results='asis'}
# format relevant information as a data frame
comp_df <- map_df(seq_along(comp_yamls), function(i) {
ya <- comp_yamls[[i]]
auths <- ya$functionality$authors
authors <- map_df(auths, function(aut) {
tibble(
name = aut$name,
email = aut$email,
roles = list(aut$roles),
orcid = ifelse(is.null(aut$props$orcid), "", paste0(" <a href='https://orcid.org/", aut$props$orcid, "'><img src='resources/orcid_id.svg' height='16'></a>")),
github = ifelse(is.null(aut$props$github), "", paste0(" <a href='https://github.com/", aut$props$github, "'><img src='resources/github_mark.svg' height='16'></a>")),
url = ifelse(is.null(aut$props$url), name, paste0("<a href='", aut$props$url, "'>", name, "</a>")),
md = paste0(url, orcid, github)
)
})
tibble(
conf = ya$info$config,
name = ya$functionality$name,
namespace = ya$functionality$namespace %||% "",
method_label = ya$functionality$info$method_label %||% "",
submission_id = ya$functionality$info$submission_id %||% "",
team_name = ya$functionality$info$team_name %||% "",
project_url = ya$functionality$info$project_url %||% "",
publication_doi = ya$functionality$info$publication_doi %||% "",
publication_url = ya$functionality$info$publication_url %||% "",
authors_text = paste0(authors$md, collapse = ", "),
authors = list(authors %>% select(-md)),
dir = gsub("/[^/]+/[^/]+$", "", conf)
)
})
```
```{r echo = FALSE, warning = FALSE, error = FALSE, message = FALSE, results='asis'}
# combine info final markdown table
comp_df %>%
filter(!grepl("_train$", name)) %>%
arrange(namespace, name) %>%
transmute(
Task = gsub("_methods", "", namespace),
Team = team_name,
Method = glue::glue("[{method_label}]({dir})"),
Authors = authors_text
) %>%
knitr::kable()
```
## Dependencies
The dependencies of this repository are the same as those of the competition itself ([link](https://openproblems.bio/neurips_docs/submission/quickstart/#2-configure-your-local-environment)).
To run any of the methods, you need to download the required binaries and datasets and build the relevant docker containers first.
```
# download viash and nextflow
bin/init
# sync datasets to local
src/sync_datasets.sh
# build components and docker containers
bin/viash_build --max_threads 4
```
You can then rerun components by running the bash script located in the respective folders.
For example:
```bash
$ src/predict_modality/methods/cajal/test.sh
...
N E X T F L O W ~ version 21.04.1
Pulling openproblems-bio/neurips2021_multimodal_viash ...
Launching `openproblems-bio/neurips2021_multimodal_viash` [trusting_woese] - revision: a28e0c22c5 [1.4.0]
executor > local (5)
[4e/f4b91d] process > get_id_predictions (4) [100%] 4 of 4, cached: 3 ✔
[94/ffca9e] process > get_id_solutions (2) [100%] 4 of 4, cached: 4 ✔
[b3/92d144] process > bind_tsv_rows:bind_tsv_rows_process (meta_metric) [100%] 1 of 1, cached: 1 ✔
[44/667c85] process > mse:mse_process (openproblems_bmmc_cite_phase2_PM_gex2adt) [100%] 4 of 4, cached: 3 ✔
[4f/8061cc] process > correlation:correlation_process (openproblems_bmmc_cite_phase2_PM_gex2adt) [100%] 4 of 4, cached: 3 ✔
[ea/988a0f] process > check_format:check_format_process (openproblems_bmmc_cite_phase2_PM_gex2adt) [100%] 4 of 4, cached: 3 ✔
[dc/393dca] process > final_scores:final_scores_process (output) [100%] 1 of 1 ✔
[{"method_id":"cajal","ADT2GEX":0.3273,"ATAC2GEX":0.2172,"GEX2ADT":0.4613,"GEX2ATAC":0.178,"Overall":0.2959}]
```
After the bash script has finished running, output will have been stored in:
* `output/pretrain/<task-id>/<method-id>/<dataset-id>.output_pretrain`: Pre-trained models (if required).
* `output/predictions/<task-id>/<method-id>/<dataset-id>`: Predictions made by the method.
* `output/evaluation<method-id>/<dataset-id>`: Evaluation metrics.
Where <task-id> is one of three competition tasks (`predict_modality`, `match_modality` or `joint_embedding`).