Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative assembly implementation #165

Open
manulera opened this issue Dec 5, 2023 · 3 comments
Open

Alternative assembly implementation #165

manulera opened this issue Dec 5, 2023 · 3 comments
Assignees
Labels

Comments

@manulera
Copy link
Collaborator

manulera commented Dec 5, 2023

Hi @BjornFJohansson I had started working on a similar refactoring to #157 for the assemblies (find which assemblies can be made with one function, execute them with another). To get this to work, I had to change things significantly, so my implementation has become quite different. I still use a graph, but the meaning of edges is different. I have not implemented the Contig functionality, but I could do something similar, if you think it would make sense to include this assembly in pydna. For now, it is in the ShareYourCloning repo.

You can find the implementation in this_file. It passes all the tests that pydna's assembly passes, excluding the order in which fragments are returned, and that it only returns unique assemblies (see test file).

If you want to have a look, I would start from the docstring of Assembly, and then add_edges_from_match to understand how it works. I think it's quite simplified from the original implementation, for example, the construction of the graph is very short (the function add_edges_from_match is quite short as well):

def __init__(self, frags: list[_Dseqrecord], limit=25, algorithm=common_sub_strings, use_fragment_order=True, use_all_fragments=False):
    # TODO: allow for the same fragment to be included more than once?
    G = _nx.MultiDiGraph()
    # Add positive and negative nodes for forward and reverse fragments
    G.add_nodes_from((i + 1, {'seq': f}) for (i, f) in enumerate(frags))
    G.add_nodes_from((-(i + 1), {'seq': f.reverse_complement()}) for (i, f) in enumerate(frags))

    # Iterate over all possible combinations of fragments
    edge_pairs = _itertools.combinations(filter(lambda x : x>0, G.nodes), 2)
    for index_first, index_secnd in edge_pairs:
        first = G.nodes[index_first]['seq']
        secnd = G.nodes[index_secnd]['seq']

        # Overlaps where both fragments are in the forward orientation
        matches_fwd = algorithm(str(first.seq).upper(), str(secnd.seq).upper(), limit)
        for match in matches_fwd:
            add_edges_from_match(match, index_first, index_secnd, first, secnd, G)

        # Overlaps where the first fragment is in the forward orientation and the second in the reverse orientation
        matches_rvs = algorithm(str(first.seq).upper(), reverse_complement(str(secnd.seq).upper()), limit)
        for match in matches_rvs:
            add_edges_from_match(match, index_first, -index_secnd, first, secnd, G)

Let me know what you think.

@BjornFJohansson
Copy link
Owner

OK, lots to digest. In principle I don't mind putting this alongside and eventually replace the existing
code. Ill read it carefully and get back to you.

@manulera
Copy link
Collaborator Author

Yes, actually the code has slightly changed since, although it is still in the same spirit. I am currently using that new assembly module to do PCR, ligations (also partial) and homologous recombination. I will adapt for gibson and golden gate as well. I am just writing slightly different algorithm functions for each case.

If you think it can be useful, I can give you a short overview on zoom before you dive into it.

@manulera
Copy link
Collaborator Author

Hi @BjornFJohansson, when you merge the branches you mentioned in the call into cutsite_pairs, let me know and I can add the new assembly implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants