Skip to content

Commit

Permalink
Refactor assemble abundances
Browse files Browse the repository at this point in the history
Having attempted to run the previous version of the pipeline on much larger datasets, it's clear that assembling a single table with all gene-level abundances is not computationally tractable.

The only impact of these changes on the output of the pipeline is that the /abund/gene/wide table is no longer present (neither is the matching feather file). In practice, the execution of the rest of the pipeline is unchanged.
  • Loading branch information
sminot committed Jul 19, 2020
1 parent 3bcbb6a commit 1954b12
Show file tree
Hide file tree
Showing 6 changed files with 235 additions and 174 deletions.
1 change: 0 additions & 1 deletion bin/validate_geneshot_output.py
Expand Up @@ -27,7 +27,6 @@ def validate_results_hdf(results_hdf, check_corncob = False):
"/annot/gene/cag",
"/annot/gene/all",
"/annot/cag/all",
"/abund/gene/wide",
"/abund/cag/wide",
"/ordination/pca",
"/ordination/tsne",
Expand Down
23 changes: 12 additions & 11 deletions local_tests/test.sh
Expand Up @@ -12,21 +12,22 @@ set -e
# 3. Run this script

# Test with preprocessing and a formula
NXF_VER=19.10.0 nextflow run main.nf \
NXF_VER=20.04.1 nextflow run main.nf \
-c nextflow.config \
-profile testing \
--manifest data/mock.manifest.2.csv \
--manifest data/mock.manifest.csv \
--nopreprocess \
--output output0 \
--output output \
--hg_index data/hg_chr_21_bwa_index.tar.gz \
--formula "label1 + label2" \
--distance_threshold 0.5 \
-w work/ \
--noannot \
-resume
-resume \
-with-docker ubuntu:20.04

# # Test with preprocessing and a formula
# NXF_VER=19.10.0 nextflow run main.nf \
# NXF_VER=20.04.1 nextflow run main.nf \
# -c nextflow.config \
# -profile testing \
# --manifest data/mock.manifest.csv \
Expand All @@ -41,7 +42,7 @@ NXF_VER=19.10.0 nextflow run main.nf \
# -resume

# # Test with preprocessing and no formula
# NXF_VER=19.10.0 nextflow run main.nf \
# NXF_VER=20.04.1 nextflow run main.nf \
# -c nextflow.config \
# -profile testing \
# --manifest data/mock.manifest.csv \
Expand All @@ -54,7 +55,7 @@ NXF_VER=19.10.0 nextflow run main.nf \
# -resume

# # Test with formula and no preprocessing
# NXF_VER=19.10.0 nextflow run main.nf \
# NXF_VER=20.04.1 nextflow run main.nf \
# -c nextflow.config \
# -profile testing \
# --nopreprocess \
Expand All @@ -68,7 +69,7 @@ NXF_VER=19.10.0 nextflow run main.nf \
# -resume

# # Test with no formula and no preprocessing
# NXF_VER=19.10.0 nextflow run main.nf \
# NXF_VER=20.04.1 nextflow run main.nf \
# -c nextflow.config \
# -profile testing \
# --nopreprocess \
Expand All @@ -80,7 +81,7 @@ NXF_VER=19.10.0 nextflow run main.nf \
# -resume

# # Test with the gene catalog made in a previous round
# NXF_VER=19.10.0 nextflow run main.nf \
# NXF_VER=20.04.1 nextflow run main.nf \
# -c nextflow.config \
# -profile testing \
# --gene_fasta output1/ref/genes.fasta.gz \
Expand All @@ -94,7 +95,7 @@ NXF_VER=19.10.0 nextflow run main.nf \


# # Test with the gene catalog made in a previous round and whole genome alignment
# NXF_VER=19.10.0 nextflow run main.nf \
# NXF_VER=20.04.1 nextflow run main.nf \
# -c nextflow.config \
# -profile testing \
# --gene_fasta output1/ref/genes.fasta.gz \
Expand All @@ -111,7 +112,7 @@ NXF_VER=19.10.0 nextflow run main.nf \
# -resume

# # Test with de novo assembly and whole genome alignment
# NXF_VER=19.10.0 nextflow run main.nf \
# NXF_VER=20.04.1 nextflow run main.nf \
# -c nextflow.config \
# -profile testing \
# --nopreprocess \
Expand Down
5 changes: 0 additions & 5 deletions main.nf
Expand Up @@ -399,10 +399,6 @@ workflow {
combineReads.out,
params.output_prefix
)
// Publish the gene abundance feather file
publishGeneAbundances(
alignment_wf.out.gene_abund_feather
)

// ########################
// # STATISTICAL ANALYSIS #
Expand Down Expand Up @@ -439,7 +435,6 @@ workflow {

collectAbundances(
alignment_wf.out.cag_csv,
alignment_wf.out.gene_abund_feather,
alignment_wf.out.cag_abund_feather,
countReadsSummary.out,
manifest_file,
Expand Down

0 comments on commit 1954b12

Please sign in to comment.