Skip to content

Commit

Permalink
fix container workflow (#138)
Browse files Browse the repository at this point in the history
  • Loading branch information
zktuong committed Mar 17, 2022
1 parent 4b1303e commit 6b13d96
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 63 deletions.
69 changes: 9 additions & 60 deletions .github/workflows/singularity_container-install.yml
Expand Up @@ -11,45 +11,16 @@ on:
branches:
- "*"
jobs:
changes:
name: "Changed files"
container:
runs-on: ubuntu-latest
outputs:
changed_file: ${{ steps.files.outputs.added_modified }}
steps:
- id: files
uses: jitterbit/get-changed-files@b17fbb00bdc0c0f63fcf166580804b4d2cdc2a42
with:
format: 'json'

build-test-containers:
needs:
- changes
runs-on: ubuntu-latest
strategy:
# Keep going on other deployments if anything bloops
fail-fast: True
matrix:
changed_file: ${{ fromJson(needs.changes.outputs.changed_file) }}

name: Check changed files
steps:
- name: Continue if file name contains the word 'container'
run: |
# Continue if we have a changed Singularity recipe
echo ${{ matrix.changed_file }}
if [[ "${{ matrix.changed_file }}" = *container* ]]; then
echo "keepgoing=true" >> $GITHUB_ENV
fi
- name: Set up Go 1.13
if: ${{ env.keepgoing == 'true' }}
uses: actions/setup-go@v1
with:
go-version: 1.13
id: go

- name: Install Dependencies
if: ${{ env.keepgoing == 'true' }}
run: |
sudo apt-get update && sudo apt-get install -y \
build-essential \
Expand All @@ -60,11 +31,9 @@ jobs:
libseccomp-dev \
pkg-config
- name: Install Singularity
if: ${{ env.keepgoing == 'true' }}
env:
SINGULARITY_VERSION: 3.8.1
GOPATH: /tmp/go

run: |
mkdir -p $GOPATH
sudo mkdir -p /usr/local/var/singularity/mnt && \
Expand All @@ -77,7 +46,6 @@ jobs:
make -C builddir && \
sudo make -C builddir install
- name: Check out code for the container build
if: ${{ env.keepgoing == 'true' }}
uses: actions/checkout@v2

- name: Extract repository location
Expand All @@ -86,35 +54,16 @@ jobs:
id: extract_location

- name: Build Container
if: ${{ env.keepgoing == 'true' }}
env:
recipe: ${{ matrix.changed_file }}
run: |
ls container
if [ -f "${{ matrix.changed_file }}" ]; then
cd container
wget https://ftp.ncbi.nih.gov/blast/executables/igblast/release/1.17.1/ncbi-igblast-1.17.1-x64-linux.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.12.0+-x64-linux.tar.gz
tar -xzvf ncbi-igblast-1.17.1-x64-linux.tar.gz
tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz
echo '${{ steps.extract_location.outputs.location }}' >> environment_test.yml
sudo -E singularity build --notest sc-dandelion.sif sc-dandelion_test.def
tag=$(echo "${recipe/Singularity\./}")
if [ "$tag" == "Singularity" ]; then
tag=latest
fi
# Build the container and name by tag
echo "Tag is $tag."
echo "tag=$tag" >> $GITHUB_ENV
else
echo "${{ matrix.changed_file }} is not found."
echo "Present working directory: $PWD"
ls
fi
ls container
cd container
wget https://ftp.ncbi.nih.gov/blast/executables/igblast/release/1.17.1/ncbi-igblast-1.17.1-x64-linux.tar.gz
wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.12.0/ncbi-blast-2.12.0+-x64-linux.tar.gz
tar -xzvf ncbi-igblast-1.17.1-x64-linux.tar.gz
tar -xzvf ncbi-blast-2.12.0+-x64-linux.tar.gz
echo '${{ steps.extract_location.outputs.location }}' >> environment_test.yml
sudo -E singularity build --notest sc-dandelion.sif sc-dandelion_test.def
- name: Test Container
if: ${{ env.keepgoing == 'true' }}
env:
recipe: ${{ matrix.changed_file }}
run: |
cd container
sudo singularity test --writable-tmpfs sc-dandelion.sif
Expand Down
6 changes: 3 additions & 3 deletions docs/notebooks/1_dandelion_preprocessing-10x_data.ipynb
Expand Up @@ -217,7 +217,7 @@
"source": [
"## Step 2: Reannotate the V/D/J genes with *igblastn*\n",
"\n",
"Like immcantation, we will reannotate the V(D)J genes with igblastn using the latest IMGT reference databases. However, as of v0.1.13, `pp.reannotate_genes` will use a `flavour = 'strict'` to run `igblastn`, imposing lower e-value and higher D-penalty cut offs. The original behaviour i.e. with [*changeo*](https://changeo.readthedocs.io/en/stable/examples/10x.html)'s `AssignGenes.py`, is toggled with `flavour = 'original'`. Additionally, there is now an additional `assign_dj` option (default is `False`), which will use blastn to assign a stricter call for the D and J genes because [igblastn can return random assignments if it cannot detect a V gene](https://www.ncbi.nlm.nih.gov/igblast/faq.html). This will be toggled for TCR data later. All the column headers are now adhereing to the [*AIRR*](http://docs.airr-community.org/) standard."
"Like immcantation, we will reannotate the V(D)J genes with igblastn using the latest IMGT reference databases. However, as of v0.1.13, `pp.reannotate_genes` will use a `flavour = 'strict'` to run `igblastn`, imposing lower e-value and higher D-penalty cut offs. The original behaviour i.e. with [*changeo*](https://changeo.readthedocs.io/en/stable/examples/10x.html)'s `AssignGenes.py`, is toggled with `flavour = 'original'`. Additionally, there is now an additional `assign_dj` option (default is `True`), which will use blastn to assign a stricter call for the D and J genes because [igblastn can return random assignments if it cannot detect a V gene](https://www.ncbi.nlm.nih.gov/igblast/faq.html). In the tmp folder, there will also be a table where all alignments generated in this step will be shown (only the top hit is selected for each contig). All the column headers are now adhereing to the [*AIRR*](http://docs.airr-community.org/) standard."
]
},
{
Expand Down Expand Up @@ -421,7 +421,7 @@
"\n",
"Cell Ranger's annotation files provides a *c_gene* column, but rather than simply relying on Cell Ranger's annotation, it is common to use [*immcantation-presto*'s *MaskPrimers.py*](https://presto.readthedocs.io/en/version-0.5.3---license-change/tools/MaskPrimers.html) with a custom primer list. \n",
"\n",
"As an alternative, `dandelion` includes a pre-processing function, `pp.assign_isotypes`, to use *blast* to annotate constant region calls for all contigs and retrieves the call, merging it with the tsv files. This function will overwrite the output from previous steps and add a *c_call* column at the end, or replace the existing column if it already exists. The Cell Ranger calls are returned as `c_call_10x`.\n",
"As an alternative, `dandelion` includes a pre-processing function, `pp.assign_isotypes`, to use *blastn* to annotate constant region calls for all contigs and retrieves the call, merging it with the tsv files. This function will overwrite the output from previous steps and add a *c_call* column at the end, or replace the existing column if it already exists. The Cell Ranger calls are returned as `c_call_10x`.\n",
"\n",
"Further, to deal with incorrect constant gene calls due to insufficient length, a pairwise alignment will be run against [curated sequences](https://immunology.sciencemag.org/content/6/56/eabe6291) that were deemed to be highly specific in distinguishing `IGHA1` vs `IGHA2`, and `IGHG1` to `IGHG4`. I have also curated sets of sequences that should help deal with `IGLC3/6/7` as these are problematic too. If there is insufficient info, the `c_call` will be returned as a combination of the most aligned sets of sequences. Because of how similar the lambda light chains are, extremely ambiguous calls (only able to map to a common sequence across the light chains) will be returned as `IGLC`. This typically occurs when the constant sequence is very short. Those that have equal alignment scores between `IGLC3/6/7` sequences and the common sequence will be returned as a concatenated call; for example, a contig initially annotated as `IGLC3` will be returned as `IGLC,IGLC3`. \n",
"\n",
Expand Down Expand Up @@ -528,7 +528,7 @@
"ddl.pp.assign_isotypes(samples, blastdb = \"path/to/custom_BCR_constant.fasta\")\n",
"```\n",
"\n",
"This may take a while when dealing with large files; the number of cpus to size of file isn't exactly linear. Nevertheless, I have enabled parallelization as default because there were noticeable improvements in processing speeds with the smaller files. It should work faster with more cpus. The default option will return a summary plot that can be disabled with `plot = False`.\n",
"The default option will return a summary plot that can be disabled with `plot = False`.\n",
"\n",
"Finally, it's worthwhile to manually check the the sequences for constant calls returned as IGHA1-2, IGHG1-4 and the light chains and manually correct them if necessary.\n",
"\n",
Expand Down

0 comments on commit 6b13d96

Please sign in to comment.