FuSe

Functional grouping of transcripts for RNA-Seq analyses

After cloning the repository, first run the following command

On Linux
cat bi_do.part0* > ~/bi_do.rar
unrar e bi_do.ra

On Windows
unrar using Winrar
Click on any .rar file (bi_do.part0*) and then click extract here

Environment setup: FuSe requires Perl (v5.26.1 or higher) and certain Perl modules. When using FuSe for the first time, install the Perl modules by running the following command from command prompt.

cpan File::Find::Rule Storable List::Util List::MoreUtils Array::Utils Data::Dumper Getopt::Long JSON::XS File::Slurp

Another prerequisite for FuSe is a relationship file for Ensembl gene, transcript and protein ids; which can be obtained from Biomart; refer FuSe/data/sample_biomart.txt.
Note: Make sure that the Biomart relationship file is the same version as used for genome alignment.

Using precomputed BLAST Interpro data object (bi_do)

Protein pairs: All the protein pair combinations are created and both confidence scores (KS and DS) are calculated for them. If you have used the Ensmebl Homo_sapiens.GRCh38 for alignment, then the precomputed data object bi_do.data can be used for analyses.
Usage:
perl /path/to/script/cal_pp.pl --help
perl /path/to/script/cal_pp_conf.pl --rel /path/to/file/Biomart_rel.txt --bi_do /path/to/data_object/bi_do.data --ss /path/to/ss/scoring_scheme.txt --out_path /path/to/outfile/ --pp_do prot_pairs.data
SFPGs: The overlapping protein pairs which are over the given CSC are used to create SFPGs. The SFPG confidence is calculated by averaging the protein pair scores. One SFPG is formed for each protein coding transcript that has other similar protein coding transcripts.
Usage:
perl /path/to/script/make_sfpgs.pl --help
perl /path/to/script/make_sfpgs.pl --pp_do /path/to/file/prot_pairs.data --score_type KS --csc 95 --out_path /path/to/outfile/ --sfpg sfpg.data
SFPGs expression: Using the normalized FPKM for the samples and SFPG data, expression is calculated for all SFPGs. The SFPG expression can be calculated using one of the distribution: Equal distributioin (ED) or Group size distribution (GD) For computing normalized FPKM refer extra/norm_fpkm.r
Usage:
perl /path/to/script/recal_expression.pl --help
perl /path/to/script/recal_expression.pl --input /path/to/file/exp_file.txt --type 2 --sfpg /path/to/file/sfpg.data --out_path /path/to/outfile/ --recal recal_exp.txt

Creating your own BLAST Interpro data object (bi_do)

Data preparation: Sequence alignment, protein domain, motifs and family information are required to make the bi_do. Protein sequences can be obtained from Ensembl. BLAST+ and Interpro were run as explained in their user manuals. For BLAST+, first a protein reference database was created using all sequences and then aligned to all sequences. The alignment results were then obtained in out format 7. -outfmt "7 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen"

The results from Interpro were obtained in .tsv format using the default values. To use in future available updates to Ensembl, the data should be generated as explained in this step.

Preprocessing: The data generated from BLAST+ and Interpro were put together to create the bi_do.
Usage:
perl /path/to/script/preprocessing_data.pl --help
perl /path/to/script/preprocessing_data.pl --interpro path/to/file/interpro.tsv --blast /path/to/file/blast.txt --out_path /path/to/outfile/ --bi_do bi_do.data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

extra

extra

README.md

README.md

cal_pp_conf.pl

cal_pp_conf.pl

make_sfpgs.pl

make_sfpgs.pl

preprocessing_data.pl

preprocessing_data.pl

recal_expression.pl

recal_expression.pl

Repository files navigation

FuSe

Functional grouping of transcripts for RNA-Seq analyses

Using precomputed BLAST Interpro data object (bi_do)

Creating your own BLAST Interpro data object (bi_do)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data		data
extra		extra
README.md		README.md
cal_pp_conf.pl		cal_pp_conf.pl
make_sfpgs.pl		make_sfpgs.pl
preprocessing_data.pl		preprocessing_data.pl
recal_expression.pl		recal_expression.pl

rajinder4489/FuSe

Folders and files

Latest commit

History

Repository files navigation

FuSe

Functional grouping of transcripts for RNA-Seq analyses

Using precomputed BLAST Interpro data object (bi_do)

Creating your own BLAST Interpro data object (bi_do)

About

Resources

Stars

Watchers

Forks

Languages