Merge pull request #27 from telatin/main

Update vscode + Add nf-core
MRC-CLIMB · Mar 19, 2024 · b93ed5f · b93ed5f
2 parents 4224c57 + f003890
commit b93ed5f
Show file tree

Hide file tree

Showing 6 changed files with 121 additions and 8 deletions.
diff --git a/docs/img/nf-fetch-run.png b/docs/img/nf-fetch-run.png
diff --git a/docs/index.md b/docs/index.md
@@ -47,5 +47,9 @@ A simple walk-through of some CLIMB-BIG-DATA functionality.
 [QIIME 2](walkthroughs/qiime2.md)  
 How to install QIIME 2 on a notebook server and basic usage.
 
+[nf-core pipelines](walkthroughs/nfcore.md)
+How to run some of the nf-core pipelines on CLIMB notebooks
+
 [How to fix login error 403](notebook-servers/403-forbidden-error.md)  
-An explanation of how to resolve login error 403 when accessing notebooks.
+An explanation of how to resolve login error 403 when accessing notebooks.
+
diff --git a/docs/notebook-servers/index.md b/docs/notebook-servers/index.md
@@ -20,8 +20,8 @@ How to install software using Conda, in the context of a containerized environme
 [Using Nextflow](using-nextflow.md)  
 How to use Nextflow with CLIMB-BIG-DATA.
 
-[Using VS Code](using-vscode.md)
-How to connect to your CLIMB Notebook using VS Code
+[Using Visual Studio Code](using-vscode.md)
+How to connect to your CLIMB Notebook and work from Visual Studio Code
 
 [How to fix login error 403](403-forbidden-error.md)  
 An explanation of how to resolve login error 403 when accessing notebooks.
diff --git a/docs/notebook-servers/using-vscode.md b/docs/notebook-servers/using-vscode.md
@@ -1,7 +1,12 @@
-# Editing with VS Code
+# Working on your notebook from Visual Studio Code
 
 Visual Studio Code, or VS Code, is a very popular [IDE](https://aws.amazon.com/what-is/ide/). 
-In this tutorial we will see how to use VS code to connect to your CLIMB Jupyter Notebook
+In this tutorial we will see how to use VS code to connect to your CLIMB Jupyter Notebook.
+
+This tutorial shows you how you can configure your CLIMB BIG DATA notebook to accept a connection from
+your local installation of Visual Studio Code. In this way you will be able to use the code editor, the terminal, and the drag-n-drop file bar of Visual Studio Code instead of the - great but limited - web interface.
+
+This is a tutorial for advanced users aiming at integrating their workflow with their CLIMB BIG DATA notebook.
 
 <!-- prettier-ignore -->
 !!! Prerequisites
@@ -11,6 +16,11 @@ In this tutorial we will see how to use VS code to connect to your CLIMB Jupyter
 
 ## Install "Tunnels"
 
+<!-- prettier-ignore -->
+!!! Further reading
+    The enabling technology of this tutorial is described in the [Developing with Remote Tunnels](https://code.visualstudio.com/docs/remote/tunnels) page of Visual Studio Code documentation. 
+    Note that the section *How can I ensure I keep my tunnel running?* will not work on notebooks.
+
 Inside your local VS Code, install the extension `Remote - Tunnels` by Microsoft.
 
 After you install it, go to your *Remotes* tab and login using your GitHub account (you should see a *Sign in using your GitHub account* item in the menu).
@@ -71,4 +81,12 @@ Open this link in your browser https://vscode.dev/tunnel/jupyter-telatin-2enxf
 
 When you are done, you can either click on the link provided on your terminal, or refresh
 your tunnels list in your *local* VS Code, and as shown in the image above, you should see the
-`jupyter-groupname-id` (or custom name you gave)
+`jupyter-groupname-id` (or custom name you gave)
+
+## What you can do now
+
+1. Your Visual Studio Code **terminal** will now display your CLIMB terminal: you will find the paths and the conda environments of your notebook, and the executions will happen on your notebook.
+2. Your file navigation will show you your CLIMB files, and you will be able to download and upload files to your notebook dragging and dropping files from the left sidebar
+3. Most notably, your code editor will be Visual Studio Code, you will have the syntax highlighting, multi-edit, plug-ins and other features of Visual Studio Code to edit and visualise your CLIMB BIG DATA files.
+
+
diff --git a/docs/walkthroughs/nfcore.md b/docs/walkthroughs/nfcore.md
@@ -0,0 +1,90 @@
+# Running nf-core pipelines
+
+## What are nf-core pipelines?
+
+[nf-core](https://nf-co.re/) is an organisation backing an international effort to create high-quality,
+reproducible pipelines written in [Nextflow](https://nextflow.io/).
+
+Some examples of nf-core pipelines include:
+
+* [nf-core/fetchngs](https://nf-co.re/fetchngs/): to download raw datasets from public repositories (ENA, SRA...)
+* [nf-core/rnaseq](https://nf-co.re/rnaseq/): to perform a differential expression analysis of RNA-Seq datasets
+* [nf-core/ampliseq](https://nf-co.re/ampliseq/): to analyse metabarcoding (16S, ITS...) experiments (mostly based on Qiime2)
+* [nf-core/taxprofiler](https://nf-co.re/taxprofiler/): to run multiple taxonomy profiling tools on a metagenomics dataset
+* [nf-core/mag](https://nf-co.re/mag/): to assemble and bin whole metagenome sequencing runs
+
+See the full list [online](https://nf-co.re/pipelines).
+
+## How to run a nf-core pipeline?
+
+There is a very good [documentation](https://nf-co.re/docs) available from the nf-core website, and 
+even a great set of video tutorials.
+
+A first attempt of running a pipeline should be using its *test* profile. This means that the pipeline will
+try to analyse some test data known to work, and after getting a successful ending we can go further and try with our own data.
+
+The general syntax is:
+```text
+nextflow run nf-core/<pipeline_name> -r <version> -profile test --outdir  /shared/team/<output-dir>
+```
+
+Where:
+
+* `<pipeline_name>` is of course the actual pipeline you want to run
+* `<version>` is the revision you want to use (this is important and will ensure reproducibility, check the pipeline website to see the last version)
+* `<output-dir>` where Nextflow will save the files. **NOTE** that your home directory will not work!
+
+For example, to test the `rnaseq` pipeline:
+
+```console
+nextflow run nf-core/rnaseq -r 3.14.0 -profile test --outdir /shared/team/test-out-rnaseq
+```
+
+## An example: fetchngs
+
+`nf-core/fetchngs` is a pipeline to download a set of NGS output from public repositories such as [NCBI Short Reads Archive](https://www.ncbi.nlm.nih.gov/sra).
+
+We can use it as a first example pipeline as its input is a simple text file with a list of accession codes.
+
+Remembering that Nextflow pipelines will not have access to any file saved in your home directory, we can create an input file like:
+
+```console
+mkdir -p /shared/team/download-lists/
+echo -e "ERR12319563\nERR12319484\nERR12319547" > /shared/team/download-lists/test.csv
+```
+
+<!-- prettier-ignore -->
+!!! Edit the list
+    The `echo` command created a list with three accession numbers from the command line, 
+    but you can use the handy text editor built-in in the CLIMB notebook to create a new file.
+    It's important to use the `csv` extension though.
+
+```bash
+# The \ in the command allows to break a command in multiple lines
+# If you type the command in a single line, do NOT type the "\"s
+
+nextflow run nf-core/fetchngs -r 1.12.0 \
+   --input /shared/team/download-lists/test.csv \
+   --outdir /shared/team/fetchngs-out/
+```
+
+Example execution:
+
+<img src="../img/nf-fetch-run.png" alt="nf-core fetchngs execution" height="500">
+
+## S3 buckets
+
+A very handy feature of Nextflow, is that it can read and write to S3 buckets.
+
+If we want to save the output of the nf-core/fetchngs pipeline to a CLIMB S3 bucket 
+(suppose you have a bucket called "ngs-files"),
+we can simply change the output path to something like:
+
+```bash
+# The \ in the command allows to break a command in multiple lines
+# If you type the command in a single line, do NOT type the "\"s
+
+nextflow run nf-core/fetchngs -r 1.12.0 \
+   --input /shared/team/download-lists/test.csv \
+   --outdir s3://ngs-files/fetchngs-output/
+```
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -50,10 +50,11 @@ nav:
       - "Understanding storage": "storage/index.md"
       - "Installing software with Conda": "notebook-servers/installing-software-with-conda.md"
       - "Using Nextflow": "notebook-servers/using-nextflow.md"
-      - "Using VS Code": "notebook-servers/using-vscode.md"
+      - "Using Visual Studio Code": "notebook-servers/using-vscode.md"
       - "403 Forbidden Error": "notebook-servers/403-forbidden-error.md"
   - "Walkthroughs":
       - "Metagenomics": "walkthroughs/metagenomics-tutorial.md"
       - "Genome assembly": "walkthroughs/genome-assembly/spades.md"
       - "Custom Nextflow Workflows": "walkthroughs/nextflow-custom-workflows/nextflow-custom.md"
-      - "QIIME 2": "walkthroughs/qiime2.md"
+      - "QIIME 2": "walkthroughs/qiime2.md"
+      - "nf-core pipelines": "walkthroughs/nfcore.md"