Skip to content

iliapopov17/SequenceForge-Lite

Repository files navigation

SequenceForge-Lite

Python3 OS License

SequenceForge-Lite is a lightweight tool designed to work with biological sequence data, providing various functionalities for filtering FASTQ files and manipulating FASTA files. Additionally, it offers utilities for parsing BLAST output files.

Table of contents

Features

FASTQ Filtering

  • Filter FASTQ files based on GC content, sequence length, and quality score.
  • Specify custom ranges for GC content and sequence length.
  • Set a minimum quality score threshold for sequences.

FASTA File Manipulation

  • Get quick info on each sequence in FASTA file.
  • Convert multiline FASTA files to one-line format.
  • Shift the start position of one-line FASTA sequences by a specified amount.

BLAST Output Parsing

  • Extract the top hit for each query from BLAST output files.
  • Results are sorted alphabetically for easy analysis.

DNA, RNA & amino acid classes

  • Calculates GC content in DNA and RNA sequences
  • Prints complement sequence for DNA
  • Transcribes DNA sequence to RNA
  • Prints RNA sequence in codons
  • Finds motifs in nucleic acids sequences
  • Translates RNA sequence to amino acid (without biological meaning, it does it "dumbly")
  • Calculates molecular weight of amino acid sequence

Custom RandomForestClassifier

  • Self written implementation of RandomForestClassifier
  • Has parallelisation functionality (speeds up 2 times when specifying 2 threads)

Installation

git clone https://github.com/iliapopov17/SequenceForge-Lite.git && cd SequenceForge-Lite
pip install -r requirements.txt

Usage Guide

  • Demonstrational python notebook is available in demo.ipynb file
  • Demonstrational data is available in demo_data folder

🔗 Visit SequenceForge-Lite tutorial web page

Troubleshooting

Common Issues and Solutions:

  1. File Not Found Error:
  • Issue: The script raises a FileNotFoundError when trying to access the input file.
  • Solution: Verify that the input file path provided to the function is correct and the file exists in the specified location.
  1. Incorrect File Format:
  • Issue: The function fails to process the file due to incorrect formatting.
  • Solution: Ensure that the input files are properly formatted according to the specifications mentioned in the function or class descriptions.

Contributing

Contributions are welcome! If you have any ideas, bug fixes, or enhancements, feel free to open an issue or submit a pull request.

Contact

For any inquiries or support, feel free to contact me via email

Happy sequencing! 🧬🔬