Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating kmers from annotation files #19

Open
drtamermansour opened this issue Feb 24, 2021 · 0 comments
Open

Generating kmers from annotation files #19

drtamermansour opened this issue Feb 24, 2021 · 0 comments

Comments

@drtamermansour
Copy link
Member

drtamermansour commented Feb 24, 2021

Annotation files include GFF, GTF and BED files. We can use any of these files to generate k-mers in 3 main scenarios:

  1. If these files are annotation of transcriptomes: We can use gffread (for GFF or GTF files) or getfasta from bedtools (for GFF or BED files). Note: getfasta in bedtools has 2 related arguments (-split and -rna). We to examine their effect carefully

  2. If the user does not want splicing to happen.
    a. If we have a BED file that annotation genomic blocks: getfasta from bedtools is straightforward
    b. If we have transcriptome annotation file but the user needs each exon as a separate entry: We need to convert the GFF or GTF to BED then we can use getfasta from bedtools as in (a).

## gffread can convert GFF to GTF  
gffread example.gff  -T -o example.gtf

##  UCSC_kent_commands has a binary tool to convert gtf to GenePred format 
wget https://github.com/drtamermansour/horse_trans/raw/master/scripts/UCSC_kent_commands/gtfToGenePred
chmod +x gtfToGenePred
./gtfToGenePred example.gtf example.gpred

## I have script that I got from somewhere I do not remember to convert GenePred to BED file
wget https://raw.githubusercontent.com/drtamermansour/horse_trans/master/scripts/genePredToBed
chmod +x genePredToBed
cat example.gpred | ./genePredToBed > example.bed
  1. If we have transcriptome annotation file but the user needs to generate k-mers from non-exonic structures (e.g. introns, upstream sequences, downstream sequences, exon-exon junctions: We can transform the annotation files to BED files then we need to create a simple script to transform this transcriptome BED file into another BED file that represent the target loci of the user
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants