Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sets with overlapping primers #39

Open
ClarkLabUCB opened this issue Mar 8, 2017 · 3 comments
Open

Sets with overlapping primers #39

ClarkLabUCB opened this issue Mar 8, 2017 · 3 comments

Comments

@ClarkLabUCB
Copy link

I am getting sets that have overlapping primers (see below for full output):

For example:
CGAATCGTTCTA
GCGAATCGTTCT

Is this allowed in swga? or am I setting the parameters wrong? or could it be a bug?

PRIMER SUMMARY

There are 43704 primers in the database.

500 are marked as active (i.e., they passed filter steps and will be
used to find sets of compatible primers.)

The average number of foreground genome binding sites is 1.
(avg binding / genome_length = 0.000110)
The average number of background genome binding sites is 4727.
(avg binding / genome_length = 0.000001)

The melting temp of the primers ranges between 49.50C and 64.81C with an average of 58.54C.

SETS SUMMARY

There are 84463 sets in the database.
The best scoring set is #7111, with 10 primers and a score of 0.000011.
Various statistics:

  • set_size........: 10
  • bg_dist_mean....: 4.61052E+07
  • fg_max_dist.....: 3,584
  • fg_dist_mean....: 834.545
  • fg_dist_std.....: 1,082.58
  • fg_dist_gini....: 0.596059
  • scoring_fn......: (fg_dist_mean * fg_dist_gini) / (bg_dist_mean)
    The primers in Set 7111 are:
    ACTGCGAATCGT, AGACCGGTTCTA, AGGCGTTACTCG, CGAATCGTTCTA, GCGAATCGTTCT, GGCGTTACTCGA, GTAGCAAGCTCG, GTCTATCGGCTC, TACCGTCAGCGT, TCCGCAGATCGT
@eclarke
Copy link
Owner

eclarke commented Mar 8, 2017

Hi Iain,

We check to make sure the primers don't have more than a few complementary bases between them to avoid primer dimers, and check to make sure one primer is not a complete subset of another. The primers you've identified look very similar, but we don't know for sure that they actually land on the same spot in the genome (since their sequences aren't identical or subsets of each other).

In short, it's okay to have overlapping primers, just not subsets or regions of complementarity.

Erik

@ClarkLabUCB
Copy link
Author

ClarkLabUCB commented Mar 8, 2017 via email

@eclarke
Copy link
Owner

eclarke commented Mar 9, 2017

While we've thought about excluding primers that are that substantially similar, there will be situations where being off by one or two bases does actually change their binding pattern in the foreground genome. I don't think it'd negatively affect them, and your solution of increasing the maximum set size (or even just excluding the overlapping primers) would be reasonable.

For your other questions:

  1. I would choose an annealing temperature around the operating temperature of your enzyme. We haven't tested the method with enzymes other than phi29 though so I'm not sure how the multiple displacement amplification would work at higher temperatures and with different enzymes.

  2. I believe that Ns are ignored for kmer formation (but I need to test exactly what dsk, the program we use for kmer counting, does). They are preserved when calculating distances.

Best,
Erik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant