Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping error at .tmpMap//tmp.test2.*.SJ.out.tab' #378

Open
aliibarry opened this issue Oct 26, 2023 · 5 comments
Open

Mapping error at .tmpMap//tmp.test2.*.SJ.out.tab' #378

aliibarry opened this issue Oct 26, 2023 · 5 comments

Comments

@aliibarry
Copy link

Trying to analyse some SMART-SEQ3 data and can't manage to get past the mapping step. Any suggestions would be much appreciated. I've remade my index multiple times (STAR --version is giving 2.7.3a, even though it's being flagged below as 2.7.1a?), and have also tried with using my own dependencies and STAR 2.7.11a, as well as a fresh zUMI pull (working with 2.9.7e).

bash zUMIs/zUMIs.sh -c -y patch-seq/patchseq.yaml

Currently using the yaml provided from smart-seq3 example (https://github.com/sandberg-lab/Smart-seq3/blob/master/allele_level_expression/mouse_cross.yaml) with num_threads: and mem_limit: adjusted, as well as no barcode_file:

Output is as follows

Using miniconda environment for zUMIs!
 note: internal executables will be used instead of those specified in the YAML file!


 You provided these parameters:
 YAML file:     patch-seq/patchseq.yaml
 zUMIs directory:               /home/amb/zUMIs
 STAR executable                STAR
 samtools executable            samtools
 pigz executable                pigz
 Rscript executable             Rscript
 RAM limit:   100
 zUMIs version 2.9.7e


Thu Oct 26 18:22:15 CEST 2023
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Filtering...
Thu Oct 26 19:26:56 CEST 2023
[1] "84 barcodes detected."
[1] "1705037 reads were assigned to barcodes that do not correspond to intact cells."
[1] "Found 1739 daughter barcodes that can be binned into 84 parent barcodes."
[1] "Binned barcodes correspond to 1290360 reads."
Mapping...
[1] "2023-10-26 19:36:46 CEST"
Oct 26 19:36:50 ..... started STAR run
Oct 26 19:36:52 ..... loading genome
Oct 26 19:36:50 ..... started STAR run
Oct 26 19:36:52 ..... loading genome
Oct 26 19:36:50 ..... started STAR run
Oct 26 19:36:52 ..... loading genome
cp: cannot stat '/home/amb/patch-seq/zumis_out/zUMIs_output/.tmpMap//tmp.test2.*.SJ.out.tab': No such file or directory
[main_cat] ERROR: input is not BAM or CRAM
[main_cat] ERROR: input is not BAM or CRAM
Thu Oct 26 19:41:39 CEST 2023
Counting...
[1] "2023-10-26 19:41:49 CEST"
[1] "1.5e+08 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/amb/patch-seq/zumis_out/test2.final_annot.gtf"
[E::hts_open_format] Failed to open file /home/amb/patch-seq/zumis_out/test2.filtered.tagged.Aligned.out.bam
samtools view: failed to open "/home/amb/patch-seq/zumis_out/test2.filtered.tagged.Aligned.out.bam" for reading: No such file or directory
[E::hts_open_format] Failed to open file /home/amb/patch-seq/zumis_out/test2.filtered.tagged.Aligned.out.bam
samtools view: failed to open "/home/amb/patch-seq/zumis_out/test2.filtered.tagged.Aligned.out.bam" for reading: No such file or directory
Error in gsub("SN:", "", chr) : object 'chr' not found
Calls: .makeSAF ... .chromLengthFilter -> [ -> [.data.table -> eval -> eval -> gsub
In addition: Warning message:
In data.table::fread(bread, col.names = c("chr", "len"), header = F) :
  File '/tmp/RtmpKL80mU/file2bd85af4ee1a' has size 0. Returning a NULL data.table.
Execution halted
Thu Oct 26 19:42:03 CEST 2023
Loading required package: yaml
Loading required package: Matrix
[1] "loomR found"
Error in gzfile(file, "rb") : cannot open the connection
Calls: rds_to_loom -> readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/home/amb/patch-seq/zumis_out/zUMIs_output/expression/test2.dgecounts.rds', probable reason 'No such file or directory'
Execution halted
Thu Oct 26 19:42:06 CEST 2023
Descriptive statistics...
[1] "I am loading useful packages for plotting..."
[1] "2023-10-26 19:42:06 CEST"
Error in gzfile(file, "rb") : cannot open the connection
Calls: readRDS -> gzfile
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/home/amb/patch-seq/zumis_out/zUMIs_output/expression/test2.dgecounts.rds', probable reason 'No such file or directory'
Execution halted

I've tried re-running this from the mapping step using which_Stage: Mapping in the YAML and get a slightly different error with an eventual Execution halted.

Thu Oct 26 20:08:07 CEST 2023
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Mapping...
[1] "2023-10-26 20:08:07 CEST"

EXITING because of FATAL INPUT ERROR: --readFilesType SAM requires specifying SE or PE reads
SOLUTION: specify --readFilesType SAM SE for single-end reads or --readFilesType SAM PE for paired-end reads

Oct 26 20:08:10 ...... FATAL ERROR, exiting
Thu Oct 26 20:08:10 CEST 2023
Counting...

As an aside: I'm trying to get this working on an HPC in parallel, but am still working through permission issues with the support team, any tips there would also be appreciated, error below.

starting zumi
Warning: YAML file doesn't include 'Rscript_exec' option; setting to 'Rscript'
Using miniconda environment for zUMIs!
 note: internal executables will be used instead of those specified in the YAML file!
mkdir: cannot create directory ‘/var/spool/slurmd/job6639867/zUMIs-env’: Permission denied
/data/userXXX/zUMIs/zUMIs.sh: line 155: /var/spool/slurmd/job6639867/zUMIs-miniconda.tar.bz2: Permission denied
@cziegenhain
Copy link
Collaborator

Hi,

That is indeed odd. Can you share the exact yaml file you use?
Do you get an unmapped.bam file in your outputs, if yes how does it look? (eg. first few lines of samtools view)

Regarding the warning on the STAR version should be OK - STAR doesn't always write the precise version number into its index files.

Best,
Christoph

@aliibarry
Copy link
Author

Hiya,

YAML is:

project: trial
sequence_files:
  file1:
    name: /home/amb/patchseq/Undetermined_S0_R1_001.fastq.gz
    base_definition:
      - cDNA(23-50)
      - UMI(12-19)
    find_pattern: ATTGCGCAATG
  file2:
    name: /home/amb/patchseq/Undetermined_S0_R2_001.fastq.gz
    base_definition:
      - cDNA(1-50)
  file3:
    name: /home/amb/patchseq/Undetermined_S0_I1_001.fastq.gz
    base_definition:
      - BC(1-8)
  file4:
    name: /home/amb/patchseq/Undetermined_S0_I2_001.fastq.gz
    base_definition:
      - BC(1-8)
reference:
  STAR_index: /home/amb/hg_genome_STAR2.7.3a #made without overhang info
    #pigz_exec: /home/amb/miniconda3/bin/pigz
    #STAR_exec: /home/amb/STAR-2.7.11a/source/STAR
    #samtools_exec: /home/amb/samtools-1.18/samtools
  Rscript_exec: /usr/bin/R
  GTF_file: /home/amb/gencode.v44.primary_assembly.annotation.gtf
  additional_STAR_params: '--limitSjdbInsertNsj 2000000 --clip3pAdapterSeq CTGTCTCTTATACACATCT'
  additional_files:
out_dir: /home/amb/patchseq/out
num_threads: 1
mem_limit: 31
filter_cutoffs:
  BC_filter:
    num_bases: 3
    phred: 20
  UMI_filter:
    num_bases: 3
    phred: 20
barcodes:
  barcode_num: ~
  barcode_file: 
  automatic: no
  BarcodeBinning: 1
  nReadsperCell: 100
  demultiplex: yes
counting_opts:
  introns: yes
  downsampling: '0'
  strand: 0
  Ham_Dist: 1
  write_ham: yes
  velocyto: no
  primaryHit: yes
  twoPass: no
make_stats: yes
which_Stage: Filtering

There is an unmapped.bam, but is seems incomplete? For out_dir/trial.filtered.tagged.unmapped.bam, this is the head:

VH01324:51:AAF5FKVM5:1:1101:18231:1000	77	*	0	0	*	*	0	0	GCTTTGTATAAACCAGTGATTTTACTACAAAAAACACTGTCCTTGAAAGA	CCCCCCCCCCC;;CCC;C;CCCCCCCCCCCCCCCCCCCCCCCC;CCCCCC	BX:Z:ATCTCAGGTACTCCTT	BC:Z:ATCTCAGGTACTCCTT	UB:Z:	QB:Z:CC;CC;CCCCCCCCCC	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18231:1000	141	*	0	0	*	*	0	0	CTTCTTAAGTGGAATATTCTAATAAGCTACCTTTTGTAAGTGCCATGTTT	CCCCCCCCCCCC-CC-CCCCCCC;CCCCCCCCCC-CCCCCCC-CCCCCCC	BX:Z:ATCTCAGGTACTCCTT	BC:Z:ATCTCAGGTACTCCTT	UB:Z:	QB:Z:CC;CC;CCCCCCCCCC	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18307:1000	77	*	0	0	*	*	0	0	CCCAGAGAGTGGGTCAGCTGGAAGCCCTGGAGACAGTCACAGCTCTCTGA	CCC-C;C-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;	BX:Z:CGAGGCTGCGGAGAGA	BC:Z:CGAGGCTGCGGAGAGA	UB:Z:	QB:Z:CC-CCCCCCCC-CC;C	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18307:1000	141	*	0	0	*	*	0	0	GCCTGGCACCATGGACTCTGTCAGGTCTGGACCCTTCGGCCAGATCTTCA	;CCCCCC;CCCCCCCC;CCCCC;CCCCCCCCCC;CCCCCCCCC;-C;;CC	BX:Z:CGAGGCTGCGGAGAGA	BC:Z:CGAGGCTGCGGAGAGA	UB:Z:	QB:Z:CC-CCCCCCCC-CC;C	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18345:1000	77	*	0	0	*	*	0	0	TCCCTGGAGCGGCAGCTCAGCGACATCGAGGAGCGCCACAACCACGACCT	CCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC	BX:Z:CGTCCTAGCTCCTTAC	BC:Z:CGTACTAGCTCCTTAC	UB:Z:	QB:Z:CCC-CC;CCCCCCCCC	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18345:1000	141	*	0	0	*	*	0	0	GTATACAGTGGCCCAGTGATGCTTCCTGCAAATGTGCTAAATCTAGTCTC	;CCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC	BX:Z:CGTCCTAGCTCCTTAC	BC:Z:CGTACTAGCTCCTTAC	UB:Z:	QB:Z:CCC-CC;CCCCCCCCC	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18383:1000	77	*	0	0	*	*	0	0	AAAGAAGATATTGCAATGTGGGAAGTAAATGAAGCCTTTAGTCTGGTTGT	CC;CCCC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC	BX:Z:CTCTCTACAGGCTTAG	BC:Z:CTCTCTACAGGCTTAG	UB:Z:	QB:Z:CCCCCCCCCCCCC-;C	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18383:1000	141	*	0	0	*	*	0	0	GCATGAGTCAAATGACCAACAATCCTGGCTCCAGACATCCCAATTGGATG	C-CCC-CCCC;CCCCCCC-CCCCCCCCCCCCCCC;C;CCCCCCCCCCCCC	BX:Z:CTCTCTACAGGCTTAG	BC:Z:CTCTCTACAGGCTTAG	UB:Z:	QB:Z:CCCCCCCCCCCCC-;C	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18459:1000	77	*	0	0	*	*	0	0	GATATAGTTTGAGTATTTGTCCTCTTCAAATCTCATGTTGAAATGTTATC	CCC;CCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC	BX:Z:GCTCATGAATTAGACG	BC:Z:GCTCATGAATTAGACG	UB:Z:	QB:Z:CCCCCCCCCCCCCCCC	QU:Z:
VH01324:51:AAF5FKVM5:1:1101:18459:1000	141	*	0	0	*	*	0	0	TTTTAAAACCAGCTCTCACATGAGCTAATGGAATAAGAACTCACTCATTA	CCCCCC;CCCCCCCCCCCCCCC;CCCCCCCCC-C;CC-CCCCCCCCCCCC	BX:Z:GCTCATGAATTAGACG	BC:Z:GCTCATGAATTAGACG	UB:Z:	QB:Z:CCCCCCCCCCCCCCCC	QU:Z:

@cziegenhain
Copy link
Collaborator

Hey,

OK that looks actually quite good for the unmapped bam, and it did clearly set the PE flags correctly to the reads which is what STAR complained about.

Anyways, my gut feeling is the commented out lines in the "reference" section may disturb things in the yaml! Please remove them completely and have a check

 #pigz_exec: /home/amb/miniconda3/bin/pigz
    #STAR_exec: /home/amb/STAR-2.7.11a/source/STAR
    #samtools_exec: /home/amb/samtools-1.18/samtools

@aliibarry
Copy link
Author

Thanks for the quick reply. I removed all comments from the yaml, but am getting the same issues. Unmapped bam output is still generated, fails during the mapping stage.

I did a fully fresh run as well, but this is the error when starting with Mapping with bash zUMIs/zUMIs.sh -c -y patchseq/patchseq.yaml

Warning: YAML file doesn't include 'pigz_exec' option; setting to 'pigz'
Warning: YAML file doesn't include 'STAR_exec' option; setting to 'STAR'
Using miniconda environment for zUMIs!
 note: internal executables will be used instead of those specified in the YAML file!


 You provided these parameters:
 YAML file:	patchseq/patchseq.yaml
 zUMIs directory:		/home/amb/zUMIs
 STAR executable		STAR
 samtools executable		samtools
 pigz executable		pigz
 Rscript executable		Rscript
 RAM limit:   31
 zUMIs version 2.9.7e 


Tue Oct 31 02:20:58 PM CET 2023
WARNING: The STAR version used for mapping is 2.7.3a and the STAR index was created using the version 2.7.1a. This may lead to an error while mapping. If you encounter any errors at the mapping stage, please make sure to create the STAR index using STAR 2.7.3a.
Mapping...
[1] "2023-10-31 14:20:58 CET"

EXITING because of FATAL INPUT ERROR: --readFilesType SAM requires specifying SE or PE reads
SOLUTION: specify --readFilesType SAM SE for single-end reads or --readFilesType SAM PE for paired-end reads

Oct 31 14:20:59 ...... FATAL ERROR, exiting
Tue Oct 31 02:20:59 PM CET 2023
Counting...
[1] "2023-10-31 14:21:02 CET"
[1] "46500000 Reads per chunk"
[1] "Loading reference annotation from:"
[1] "/home/amb/patchseq/out2/trial.final_annot.gtf"
Error in gsub("SN:", "", chr) : object 'chr' not found
Calls: .makeSAF ... .chromLengthFilter -> [ -> [.data.table -> eval -> eval -> gsub
In addition: Warning message:
In data.table::fread(bread, col.names = c("chr", "len"), header = F) :
  File '/tmp/RtmpdYJWcf/file69191ccf16f2' has size 0. Returning a NULL data.table.
Execution halted

Possibly relevant: during one trial one point I saw an error with Fastq files are not in the same order but I haven't managed to replicate the error - I think it was because I was overwriting the output directory?

@aliibarry
Copy link
Author

Just updating for anyone else seeing the same issues - I never resolved this and instead switched to a kallisto-bustools pipeline, which now has a smart-seq3 option. See biostars post.

Another option that worked for me was umi_tools > samtools > umi_tools dedup > feature counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants