Analysis of input sequence data.
This repository contains generic standalone tools for sequence analysis.
The alphabet metagene code allows the analysis of sequences of various alphabets (DNA/RNA/Protein/other alphabet) and outputs the proportions of the letter(s) across the binned sequence.
- Fasta File(s) of Sequence(s) of same/varying lengths.
- Output path
- File(s) that matches between individual sequences from the first (sample) input fasta file and the other (control) sequence file(s) Example provided. If a single fasta file is used for the analysis- ignore this option. If multiple fasta files are used, this would allow a calculation of a paired Wilcoxon rank-sum test instead of a non-paired one (that would be used by default).
- Alphabet to consider (DNA is the default).
- Sample names
- Title
- Plot colors
- Additional options that can be seen using the
-h
option
- R >= 3.3.2
- gcc >= 9.2.0
- optparse
- plyr
- stringr
- reshape2
- pheatmap
- gridExtra
View documentation:
Rscript overallAlphabetContent.R -h
Running example:
Rscript overallAlphabetContent.R -f fasta_file1_path.fa,fasta_file2_control_path.fa,fasta_fileN_control_path.fa -m matching_first_file_to_control2.txt,matching_first_file_to_controlN.txt -t 'title' -s name_file1,name_file2,name_fileN -r png -o output_path/file_name.png
Output example:
View documentation:
Rscript alphabetMetagene.R -h
Running example:
Rscript alphabetMetagene.R -f fasta_file1_path.fa,fasta_file2_control_path.fa,fasta_fileN_control_path.fa -m matching_first_file_to_control2.txt,matching_first_file_to_controlN.txt -s name_file1,name_file2,name_fileN -o output_path/file_name.pdf
Output example: