Calling variants using IsoPhase from Iso-Seq data #198

Yexin-Zhang · 2022-02-28T19:55:34Z

Hi,

I was trying to use IsoPhase do call SNPs from the Pacbio Isoseq data for a single sample. However, when I run the select_loci_to_phase.py with all the files required, I was keeping getting 0 loci.

The GFF file is obtained after running SQANTI3 filtering

11      PacBio  transcript      63617628        63627350        .       -       .       transcript_id "PB.69.1"; gene_id "PB.69";
11      PacBio  exon    63617628        63619094        .       -       .       transcript_id "PB.69.1"; gene_id "PB.69";
11      PacBio  exon    63620065        63620182        .       -       .       transcript_id "PB.69.1"; gene_id "PB.69";
11      PacBio  exon    63622845        63623057        .       -       .       transcript_id "PB.69.1"; gene_id "PB.69";
11      PacBio  exon    63623737        63623954        .       -       .       transcript_id "PB.69.1"; gene_id "PB.69";
11      PacBio  exon    63627298        63627350        .       -       .       transcript_id "PB.69.1"; gene_id "PB.69";
12      PacBio  transcript      20211007        20216155        .       +       .       transcript_id "PB.76.2"; gene_id "PB.76";
12      PacBio  exon    20211007        20211486        .       +       .       transcript_id "PB.76.2"; gene_id "PB.76";
12      PacBio  exon    20214033        20214764        .       +       .       transcript_id "PB.76.2"; gene_id "PB.76";
12      PacBio  exon    20215256        20216155        .       +       .       transcript_id "PB.76.2"; gene_id "PB.76";

The FASTQ file is obtained after doing CCS analyses - LIMA - Isoseq refine, but before clustering. The headers of each sequence are in the format @m64319e_211227_144417/3211320/ccs.
The read_stat file is the count information after collapse obtained by running get_abundance_post_collapse.py

id      is_fl   stat    pbid
m64319e_211227_144417/3211320/ccs       Y       unique  PB.1.1
m64319e_211227_144417/65145123/ccs      Y       unique  PB.1.1
m64319e_211227_144417/111935771/ccs     Y       unique  PB.2.1
m64319e_211227_144417/67240066/ccs      Y       unique  PB.2.1
m64319e_211227_144417/123208230/ccs     Y       unique  PB.3.1
m64319e_211227_144417/160236121/ccs     Y       unique  PB.3.1
m64319e_211227_144417/42860688/ccs      Y       unique  PB.3.1

I have also manually checked that the coverage at some loci is over 40, even some isoforms (e.g. P.B.78.13) are having support from more than 40 reads.

It was very mysterious since when I used the exactly same reference genome and GFF file, but changed the FASTQ file to sample1..FL.RE.CL.hq.fasta (obtained after doing clustering, headers are formated as >transcript/0 full_length_coverage=4;length=5054), as well as changed the read_stat file into a new file converted using python <path_to>/phasing/utils/convert_group_to_read_stat_file.py, everything worked fine.

The new read_stat file is like:

id      pbid    is_fl   stat
transcript/304  PB.1.1  Y       unique
transcript/3111 PB.2.1  Y       unique
transcript/3713 PB.3.1  Y       unique
transcript/3605 PB.4.1  Y       unique
transcript/2086 PB.5.1  Y       unique
transcript/2546 PB.6.1  Y       unique
transcript/1800 PB.7.1  Y       unique

I was very confused, since I was hoping the phasing step would take the information of the full-length read sequences before clustering into isoform sequences. It would be very helpful if you could help with this problem. Thank you.

Monica

The text was updated successfully, but these errors were encountered:

cpaisie · 2024-03-04T15:54:31Z

@Yexin-Zhang did you use the --fq flag in the command when you ran select_loci_to_phase.py? I had the same problem as you, but when I added --fq to the commands, everything ran as expected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling variants using IsoPhase from Iso-Seq data #198

Calling variants using IsoPhase from Iso-Seq data #198

Yexin-Zhang commented Feb 28, 2022 •

edited

cpaisie commented Mar 4, 2024

Calling variants using IsoPhase from Iso-Seq data #198

Calling variants using IsoPhase from Iso-Seq data #198

Comments

Yexin-Zhang commented Feb 28, 2022 • edited

cpaisie commented Mar 4, 2024

Yexin-Zhang commented Feb 28, 2022 •

edited