Total alignments : 0 / featureCount Results #115

cagataykos · 2021-06-26T13:02:35Z

I have human RNA-Seq dataset it has two different barcodes in the different folder. I aligned with that command
minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam
I try to quantify and counts using with Subread featureCounts function. In the subread results, there is a problem with one of the bam files. I downloaded reference and gtf files from GENCODE.
I checked the bam file with samtools view -H first.bam-second.bam I saw that I followed the same steps for each bam file.
In the IGV results, I saw matches and alignment for all bam files.

Do you have any suggestions the solve this problem? What am I doing wrong?

featureCounts -T 8 -a gencode.v38.chr_patch_hapl_scaff.annotation.gtf -g 'transcript_id' -o readcouts.txt bam/*.bam

|| Total alignments : 11214480 ||
|| Successfully assigned alignments : 4051945 (36.1%) ||
|| Running time : 2.67 minutes

|| Total alignments : 0 ||
|| Successfully assigned alignments : 0 ||
|| Running time : 2.89 minutes

I also tried with Salmon in the salmon alignment-based quantification results bam file has huge differences between each other.
salmon quant --ont -t reference.fa -l A -a first.bam -o salmon_quant1
Total # of mapped reads : 5465357

of uniquely mapped reads : 328808350000000

ambiguously mapped reads : 2177274
salmon quant --ont -t reference.fa -l A -a second.bam -o salmon_quant2
Completed first pass through the alignment file.

Total # of mapped reads : 3843632

of uniquely mapped reads : 2552463

ambiguously mapped reads : 1291169

casmerifle · 2024-02-25T12:18:46Z

Apologies for the necro but I've encountered a similar issue and would like to find out the issue. I ran nanoseq using a custom fasta and gtf, opting for stringtie2 to perform transcript quantification. Like @cagataykos, I've had transcripts identified (that is, a subset of the unique transcripts in my gtf) but counts of 0 for all of them.

To troubleshoot, I've verified that:
(i) both chromosome naming styles ("chr1" v. "1") result in the same output,
(ii) running stringtie2 independently using the intermediate minimap2 output of nanoseq results in the error:

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped!
Please make sure the -G annotation file uses the same naming convention for the genome sequences.

I was considering the possibility that the reference used for minimap2 may not be compatible with the reference used in stringtie2. However, given I supplied to the parameters csv a unique transcriptome fa and gtf, and that transcript identification was still possible by stringtie2 as part of the nanoseq pipeline, I'm not certain how to proceed further.

The csv file I used is as follows:

group,replicate,barcode,input_file,fasta,gtf
HUMAN_FTO_WT_DMSO_M,1,,/home/user/data/nanoseq/raw/PAQ63378_FTO_WT_DMSO_M/PAQ63378_FTO_WT_DMSO_M_basecalled.fastq.gz,/home/user/data/nanoseq/gencode.v44.transcripts.fa,/home/user/data/nanoseq/gencode.v44.annotation.ChrPresent.gtf
HUMAN_FTO_WT_BT2,1,,/home/user/data/nanoseq/raw/PAQ87300_FTO_WT_BT2/PAQ87300_FTO_WT_BT2_basecalled.fastq.gz,/home/user/data/nanoseq/gencode.v44.transcripts.fa,/home/user/data/nanoseq/gencode.v44.annotation.ChrPresent.gtf

Executed this command:
nextflow run nf-core/nanoseq --input /home/samuel/data/nanoseq/raw/nanoseq_pars.csv --protocol directRNA --skip_demultiplexing --skip_fusion_analysis -profile singularity

Based on this, it seems like there's an inherent incompatibility with my ref files (genocode) and what stringtie2 is able to process?

For reference, the ref gtf I created (version where naming is "1" not "Chr1"):

1       HAVANA  gene    11869   14409   .       +       .       gene_id "ENSG00000290825.1"; gene_type "lncRNA"; gene_name "DDX11L2"; level 2; tag "overlaps_pseudogene";
1       HAVANA  transcript      11869   14409   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
1       HAVANA  exon    11869   12227   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
1       HAVANA  exon    12613   12721   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";
1       HAVANA  exon    13221   14409   .       +       .       gene_id "ENSG00000290825.1"; transcript_id "ENST00000456328.2"; gene_type "lncRNA"; gene_name "DDX11L2"; transcript_type "lncRNA"; transcript_name "DDX11L2-202"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; transcript_support_level "1"; tag "basic"; tag "Ensembl_canonical"; havana_transcript "OTTHUMT00000362751.1";

Would there be an issue with the gencode source and Ensembl content?

Appreciate any help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Total alignments : 0 / featureCount Results #115

Total alignments : 0 / featureCount Results #115

cagataykos commented Jun 26, 2021

casmerifle commented Feb 25, 2024

Total alignments : 0 / featureCount Results #115

Total alignments : 0 / featureCount Results #115

Comments

cagataykos commented Jun 26, 2021

casmerifle commented Feb 25, 2024