Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about parallel alignment of different seqence #2512

Open
metaphysicser opened this issue Mar 7, 2024 · 1 comment
Open

question about parallel alignment of different seqence #2512

metaphysicser opened this issue Mar 7, 2024 · 1 comment

Comments

@metaphysicser
Copy link

Hello, I would like to use the following code provided by you to perform parallel pairwise sequence alignment on a large number of different sequences.

#include <iostream>

#include <seqan/align_parallel.h>
#include <seqan/stream.h>  // for printing strings

int main()
{
    using TSequence = seqan2::String<seqan2::Dna>;
    using TAlignedSequence = seqan2::Gaps<TSequence>;
    using TThreadModel = seqan2::Parallel;
    using TVectorSpec = seqan2::Vectorial;
    using TExecPolicy = seqan2::ExecutionPolicy<TThreadModel, TVectorSpec>;

    // dummy sequences
    TSequence seqH = "CGATT";
    TSequence seqV = "CGAAATT";

    seqan2::StringSet<TAlignedSequence> seqs1;
    seqan2::StringSet<TAlignedSequence> seqs2;

    for (size_t i = 0; i < 100; ++i) // Create a data set of 100 dummy sequences
    {
        appendValue(seqs1, TAlignedSequence(seqH));
        appendValue(seqs2, TAlignedSequence(seqV));
    }

    TExecPolicy execPolicy;
    setNumThreads(execPolicy, 4);

    seqan2::Score<int16_t, seqan2::Simple> scoreAffine(2, -2, -1, -4);

    seqan2::String<int16_t> scores = seqan2::globalAlignment(execPolicy, seqs1, seqs2, scoreAffine);

    for (int16_t score : scores)
        std::cout << "Score: " << score << "\n";

    for (size_t pos = 0; pos < seqan2::length(seqs1); ++pos) // print out alignments
    {
        std::cout << seqs1[pos] << "\n";
        std::cout << seqs2[pos] << "\n\n";
    }

    return EXIT_SUCCESS;
}

Due to my lack of understanding of SeqAn, I attempted many times and eventually found that the following code meets my requirements.

#include <seqan/align_parallel.h>
#include <seqan/stream.h>  // for printint strings
// Function to generate a random DNA sequence of given length
std::string generateRandomDNASequence(int length) {
    std::string bases = "ATCG";
    std::string sequence = "";
    for (int i = 0; i < length; ++i) {
        sequence += bases[rand() % 4]; // Randomly pick a base and append to the sequence
    }
    return sequence;
}

int main(int argc, char** argv) {
    using TSequence = seqan2::String<seqan2::Dna>;
    using TAlignedSequence = seqan2::Gaps<TSequence>;
    using TThreadModel = seqan2::Parallel;
    using TVectorSpec = seqan2::Vectorial;
    using TExecPolicy = seqan2::ExecutionPolicy<TThreadModel, TVectorSpec>;

   
    seqan2::StringSet<TAlignedSequence> seqs1;
    seqan2::StringSet<TAlignedSequence> seqs2;

    for (size_t i = 0; i < 100; ++i) // Create a data set of 100 dummy sequences
    {
        appendValue(seqs1, TAlignedSequence(*new TSequence() = generateRandomDNASequence(20)));
        appendValue(seqs2, TAlignedSequence(*new TSequence() = generateRandomDNASequence(20)));
    }

    TExecPolicy execPolicy;
    setNumThreads(execPolicy, 4);

    seqan2::Score<int16_t, seqan2::Simple> scoreAffine(2, -2, -1, -4);

    seqan2::String<int16_t> scores = seqan2::globalAlignment(execPolicy, seqs1, seqs2, scoreAffine);

    for (int16_t score : scores)
        std::cout << "Score: " << score << "\n";

    for (size_t pos = 0; pos < seqan2::length(seqs1); ++pos) // print out alignments
    {
        std::cout << seqs1[pos] << "\n";
        std::cout << seqs2[pos] << "\n\n";
    }

    return EXIT_SUCCESS;
}

However, I am also facing the issue of memory leaks. I don't know how to free the memory I've allocated. I tried using delete, but it failed.

Could you please tell me the correct way to add different sequences to a seqan2::StringSet and perform the alignment? Alternatively, how can I free the memory allocated for TSequence? Maybe this question seems simple, but it's really important to me! Thank you!

@SGSSGene
Copy link
Contributor

Hi @metaphysicser

The class seqan2::Gaps<TSequence> doesn't hold the sequence itself, but only decorations around a sequence. The sequence itself must be stored such that it outlives the seqan2::Gaps instances.

You could change your code as following to achieve this.
It changes seqs1 and seqs2 to store the actual sequences. And seqsWithGaps1 and seqsWithGaps2 store the decoration around the sequences.

    seqan2::StringSet<TSequence> seqs1;
    seqan2::StringSet<TSequence> seqs2;
    for (size_t i = 0; i < 100; ++i) // Create a data set of 100 dummy sequences
    {
        appendValue(seqs1, generateRandomDNASequence(20));
        appendValue(seqs2, generateRandomDNASequence(20));
    }

    seqan2::StringSet<TAlignedSequence> seqsWithGaps1;
    seqan2::StringSet<TAlignedSequence> seqsWithGaps2;
    for (size_t i = 0; i < 100; ++i) // Create wrapper around the sequences
    {
        appendValue(seqsWithGaps1, TAlignedSequence(seqs1[i]));
        appendValue(seqsWithGaps2, TAlignedSequence(seqs2[i]));
    }

    TExecPolicy execPolicy;
    setNumThreads(execPolicy, 4);

    seqan2::Score<int16_t, seqan2::Simple> scoreAffine(2, -2, -1, -4);

    seqan2::String<int16_t> scores = seqan2::globalAlignment(execPolicy, seqsWithGaps1, seqsWithGaps2, scoreAffine);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants