You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @BjornFJohansson I had started working on a similar refactoring to #157 for the assemblies (find which assemblies can be made with one function, execute them with another). To get this to work, I had to change things significantly, so my implementation has become quite different. I still use a graph, but the meaning of edges is different. I have not implemented the Contig functionality, but I could do something similar, if you think it would make sense to include this assembly in pydna. For now, it is in the ShareYourCloning repo.
You can find the implementation in this_file. It passes all the tests that pydna's assembly passes, excluding the order in which fragments are returned, and that it only returns unique assemblies (see test file).
If you want to have a look, I would start from the docstring of Assembly, and then add_edges_from_match to understand how it works. I think it's quite simplified from the original implementation, for example, the construction of the graph is very short (the function add_edges_from_match is quite short as well):
def__init__(self, frags: list[_Dseqrecord], limit=25, algorithm=common_sub_strings, use_fragment_order=True, use_all_fragments=False):
# TODO: allow for the same fragment to be included more than once?G=_nx.MultiDiGraph()
# Add positive and negative nodes for forward and reverse fragmentsG.add_nodes_from((i+1, {'seq': f}) for (i, f) inenumerate(frags))
G.add_nodes_from((-(i+1), {'seq': f.reverse_complement()}) for (i, f) inenumerate(frags))
# Iterate over all possible combinations of fragmentsedge_pairs=_itertools.combinations(filter(lambdax : x>0, G.nodes), 2)
forindex_first, index_secndinedge_pairs:
first=G.nodes[index_first]['seq']
secnd=G.nodes[index_secnd]['seq']
# Overlaps where both fragments are in the forward orientationmatches_fwd=algorithm(str(first.seq).upper(), str(secnd.seq).upper(), limit)
formatchinmatches_fwd:
add_edges_from_match(match, index_first, index_secnd, first, secnd, G)
# Overlaps where the first fragment is in the forward orientation and the second in the reverse orientationmatches_rvs=algorithm(str(first.seq).upper(), reverse_complement(str(secnd.seq).upper()), limit)
formatchinmatches_rvs:
add_edges_from_match(match, index_first, -index_secnd, first, secnd, G)
Let me know what you think.
The text was updated successfully, but these errors were encountered:
OK, lots to digest. In principle I don't mind putting this alongside and eventually replace the existing
code. Ill read it carefully and get back to you.
Yes, actually the code has slightly changed since, although it is still in the same spirit. I am currently using that new assembly module to do PCR, ligations (also partial) and homologous recombination. I will adapt for gibson and golden gate as well. I am just writing slightly different algorithm functions for each case.
If you think it can be useful, I can give you a short overview on zoom before you dive into it.
Hi @BjornFJohansson, when you merge the branches you mentioned in the call into cutsite_pairs, let me know and I can add the new assembly implementation.
Hi @BjornFJohansson I had started working on a similar refactoring to #157 for the assemblies (find which assemblies can be made with one function, execute them with another). To get this to work, I had to change things significantly, so my implementation has become quite different. I still use a graph, but the meaning of edges is different. I have not implemented the
Contig
functionality, but I could do something similar, if you think it would make sense to include this assembly in pydna. For now, it is in the ShareYourCloning repo.You can find the implementation in this_file. It passes all the tests that pydna's assembly passes, excluding the order in which fragments are returned, and that it only returns unique assemblies (see test file).
If you want to have a look, I would start from the docstring of
Assembly
, and thenadd_edges_from_match
to understand how it works. I think it's quite simplified from the original implementation, for example, the construction of the graph is very short (the functionadd_edges_from_match
is quite short as well):Let me know what you think.
The text was updated successfully, but these errors were encountered: