Skip to content
This repository has been archived by the owner on May 3, 2024. It is now read-only.

demux_isoseq_with_genome.py Issue - expecting input not generated by get_abundance_post_collapse.py #223

Open
KrRi4 opened this issue May 19, 2022 · 0 comments

Comments

@KrRi4
Copy link

KrRi4 commented May 19, 2022

Hi,

I was hoping someone could help me solve/clarify the issue I'm having below.

First, I am running cDNA_Cupcake v28.0.0, taken from README:

cDNA_Cupcake

Last Updated: 09.15.2021
Current version: 28.0.0

I have 2 IsoSeq CCS files that I ran through steps 1-7 outlined here: #167. I would like to obtain the sample specific counts which I believe can be generated using the demux_isoseq_with_genome.py script.

In order to obtain the read stat file for the above script, I ran get_abundance_post_collapse.py.

Here are the first few lines of the required input files for demux_isoseq_with_genome.py script:

collapsed.mapped.read_stat.txt
id is_fl stat pbid
1_m54328U_191216_220556/90243745/ccs Y unique PB.7953.2
1_m54328U_191216_220556/64751804/ccs Y unique PB.7953.2
1_m54328U_191216_220556/83690194/ccs Y unique PB.7953.2
1_m54328U_191216_220556/53478253/ccs Y unique PB.5007.3
2_m54328U_191216_220556/115672586/ccs Y unique PB.642.8

all_report_flnc.csv
id,strand,fivelen,threelen,polyAlen,insertlen,primer
1_m54328U_191216_220556/4/ccs,+,22,24,33,4386,1_NEB_5p--1_NEB_Clontech_3p
1_m54328U_191216_220556/11/ccs,+,21,24,30,5525,1_NEB_5p--1_NEB_Clontech_3p
1_m54328U_191216_220556/12/ccs,+,22,25,32,4805,1_NEB_5p--1_NEB_Clontech_3p
1_m54328U_191216_220556/14/ccs,+,25,22,30,7448,1_NEB_5p--1_NEB_Clontech_3p
2_m54328U_191216_220556/180554526/ccs,+,22,25,29,3578,2_NEB_5p--2_NEB_Clontech_3p

collapsed_mapped.collapsed.rep.fq
@PB.1.1|chr1:14401-29364(-)|transcript/77472 transcript/77472
GAGAGCGGAAGC....
@PB.1.2|chr1:14401-29359(-)|transcript/121124 transcript/121124
GGAAGCGG....

I get the following warning when I run the script... and the resulting output file is empty.

demux_isoseq_with_genome.py --mapped_fafq collapsed_mapped.collapsed.rep.fq --read_stat collapsed_mapped.collapsed.read_stat.txt --classify_csv all_report_flnc.csv -o test
Reading all_report_flnc.csv....
Reading collapsed_mapped.collapsed.read_stat.txt....
Reading collapsed_mapped.collapsed.rep.fq....
WARNING: Could not find PB.1.1|chr1:14401-29363(-)|transcript/77472 in collapsed_mapped.collapsed.read_stat.txt. No demux output for this sequence!
WARNING: Could not find PB.1.2|chr1:14401-29355(-)|transcript/121124 in collapsed_mapped.collapsed.read_stat.txt. No demux output for this sequence!

It appears the script is looking for the entire P.B.x.y.|chr... string in the read.stat file.
Now, if I remove everything after the P.B.x.y with sed 's/|[a-z].*//g' collapsed_mapped.collapsed.rep.fq > testingSyntax.fq and use testingSnytax.fq for the --mapped file in demux_isoseq_with_genome.py it works!

I wanted to make sure I haven't done something off with this process up until this point, or if this is an issue with the script? So any clarification would be helpful.

Thank you!
Kr

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant