Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error at SIMPLEAF_INDEX for user supplied genome/annotation #253

Open
jeremymsimon opened this issue Jul 19, 2023 · 0 comments
Open

Error at SIMPLEAF_INDEX for user supplied genome/annotation #253

jeremymsimon opened this issue Jul 19, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@jeremymsimon
Copy link

Description of the bug

I'm having some trouble getting the test data to run via singularity with alevin, specifically if I supply my own genome/annotation. It's failing at SIMPLEAF_INDEX

My command was (note specifying 0e5fc8b here to grab the dev branch, given issues #245 , #246, though same error on -r 2.3.2):

nextflow run nf-core/scrnaseq \
   -r 0e5fc8b \
   -profile test \
   -c /jsimonlab/pipelines/nfcore/nextflow.config \
   --outdir test \
   -work-dir /jsimonlab/scratch/jsimon/nextflow/work \
   --fasta GRCh38.primary_assembly.genome.fa \
   --gtf gencode.v43.annotation.gtf \
   --protocol 10XV2 \
   --aligner alevin

My config file here is rather simple and confirmed works for nf-core/rnaseq and nf-core/atacseq:

process {
	beforeScript =
	"""
	module load singularity
	export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.19.0.7-1.el7_9.x86_64
	export PATH=$PATH:$JAVA_HOME/bin:/cluster/systems/bin
	"""
	executor = 'slurm'
}

singularity {
	enabled = true
	autoMounts = true
	cacheDir = "${SINGULARITY_CACHE_DIR}"
}

The error produced is:

N E X T F L O W  ~  version 23.04.1
[...]
ERROR ~ Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:SIMPLEAF_INDEX (GRCh38.primary_assembly.genome_genes.gtf)'

Caused by:
  Process `NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:SIMPLEAF_INDEX (GRCh38.primary_assembly.genome_genes.gtf)` terminated with an error exit status (1)

Command executed:

  # export required var
  export ALEVIN_FRY_HOME=.
  
  # prep simpleaf
  simpleaf set-paths
  
  # run simpleaf index
  simpleaf \
      index \
      --threads 2 \
      --fasta GRCh38.primary_assembly.genome.fa \
      --gtf GRCh38.primary_assembly.genome_genes.gtf \
      --rlen 91 \
      -o salmon
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:SIMPLEAF_INDEX":
      simpleaf: $(simpleaf -V | tr -d '\n' | cut -d ' ' -f 2)
      salmon: $(salmon --version | sed -e "s/salmon //g")
  END_VERSIONS

Command exit status:
  1

Command output:
  2023-07-19T13:07:34.313588Z  INFO simpleaf::utils::prog_utils: could not find piscem executable, so salmon will be required.
  found `salmon` in the PATH at /usr/local/bin/salmon
  found `alevin-fry` in the PATH at /usr/local/bin/alevin-fry
  found `pyroe` in the PATH at /usr/local/bin/pyroe
  2023-07-19T13:07:42.996853Z  INFO simpleaf: pyroe cmd : /usr/local/bin/pyroe make-splici GRCh38.primary_assembly.genome.fa GRCh38.primary_assembly.genome_genes.gtf 91 salmon/ref
  2023-07-19T13:09:45.799564Z ERROR simpleaf::utils::prog_utils: command unsuccessful (signal: 9 (SIGKILL)): "/usr/local/bin/pyroe" "make-splici" "GRCh38.primary_assembly.genome.fa" "GRCh38.primary_assembly.genome_genes.gtf" "91" "salmon/ref"

Command error:
  Error: pyroe failed to return succesfully ExitStatus(unix_wait_status(9))

Which seems a lot like issue #191

However the same exact setup, specifying --genome hg38 instead of my own files runs successfully:

nextflow run nf-core/scrnaseq \
   -r 0e5fc8b \
   -profile test \
   -c /jsimonlab/pipelines/nfcore/nextflow.config \
   --outdir test \
   -work-dir /jsimonlab/scratch/jsimon/nextflow/work \
   --genome hg38 \
   --protocol 10XV2 \
   --aligner alevin

[...]

-[nf-core/scrnaseq] Pipeline completed successfully-
Completed at: 19-Jul-2023 09:27:10
Duration    : 9m 57s
CPU hours   : 0.2
Succeeded   : 16

And I know that the issue is not that the genome/annotation files are corrupt or anything, as the exact same files work fine for nf-core/rnaseq in a full-scale real-world data run

Is this a known issue? Or is there something else I'm missing here?

Also as a side note, specifying all three of --fasta, --gtf, and --transcript_fasta causes a (different) failure at the SIMPLEAF_INDEX step since it accepts either the genome or transcript FASTA but not both. This is not clear from the documentation, at least to me it implies all three are required parameters

error: the argument '--fasta <FASTA>' cannot be used with '--ref-seq <REF_SEQ>'

Command used and terminal output

No response

Relevant files

No response

System information

No response

@jeremymsimon jeremymsimon added the bug Something isn't working label Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant