You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran a line profiler on the generate_graph_consumer method on the human_review dataset (20k proteins)
It gives me the following output:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
114 def generate_graph_consumer(entry_queue, graph_queue, common_out_queue, proc_id, **kwargs):
115 """
116 TODO
117 describe kwargs and consumer until a graph is generated and digested etc ...
118 """
119 # Set proc id
120 1 3.0 3.0 0.0 kwargs["proc_id"] = proc_id
121
122 # Set feature_table dict boolean table
123 1 1.0 1.0 0.0 ft_dict = dict()
124 1 1.0 1.0 0.0 if kwargs["feature_table"] is None or len(kwargs["feature_table"]) == 0 or "ALL" in kwargs["feature_table"]:
125 1 6.0 6.0 0.0 ft_dict = dict(VARIANT=True, VAR_SEQ=True, SIGNAL=True, INIT_MET=True, MUTAGEN=True, CONFLICT=True)
126 else:
127 for i in kwargs["feature_table"]:
128 ft_dict[i] = True
129
130 # Initialize the exporters for graphs
131 1 58.0 58.0 0.0 graph_exporters = Exporters(**kwargs)
132
133 while True:
134 # Get next entry
135 20387 7568810.0 371.3 0.7 entry = entry_queue.get()
136
137 # Stop if entry is None
138 20387 18156.0 0.9 0.0 if entry is None:
139 # --> Stop Condition of Process
140 1 3.0 3.0 0.0 break
141
142 # Beginning of Graph-Generation
143 # We also collect interesting information here!
144
145 # Generate canonical graph (initialization of the graph)
146 20386 4416910.0 216.7 0.4 graph = _generate_canonical_graph(entry.sequence, entry.accessions[0])
147
148 # FT parsing and appending of Nodes and Edges into the graph
149 # The amount of isoforms, etc.. can be retrieved on the fly
150 20386 22188.0 1.1 0.0 num_isoforms, num_initm, num_signal, num_variant, num_mutagens, num_conficts =\
151 20386 312577641.0 15333.0 29.4 _include_ft_information(entry, graph, ft_dict)
152
153 # Replace Amino Acids based on user defined rules: E.G.: "X -> A,B,C"
154 20386 83272.0 4.1 0.0 replace_aa(graph, kwargs["replace_aa"])
155
156 # Digest graph with enzyme (unlimited miscleavages)
157 20386 457306111.0 22432.4 43.0 num_of_cleavages = digest(graph, kwargs["digestion"])
158
159 # Merge (summarize) graph if wanted
160 20386 29893.0 1.5 0.0 if not kwargs["no_merge"]:
161 20386 268518281.0 13171.7 25.3 merge_aminoacids(graph)
162
163 # Collapse parallel edges in a graph
164 20386 29694.0 1.5 0.0 if not kwargs["no_collapsing_edges"]:
165 20386 10804029.0 530.0 1.0 collapse_parallel_edges(graph)
166
167 # Annotate weights for edges and nodes (maybe even the smallest weight possible to get to the end node)
168 20386 948172.0 46.5 0.1 annotate_weights(graph, **kwargs)
169
170 # Calculate statistics on the graph:
171 20386 11921.0 0.6 0.0 (
172 20386 12094.0 0.6 0.0 num_nodes, num_edges, num_paths, num_paths_miscleavages, num_paths_hops,
173 20386 9768.0 0.5 0.0 num_paths_var, num_path_mut, num_path_con
174 20386 297176.0 14.6 0.0 ) = get_statistics(graph, **kwargs)
175
176 # Verify graphs if wanted:
177 20386 11624.0 0.6 0.0 if kwargs["verify_graph"]:
178 verify_graph(graph)
179
180 # Persist or export graphs with speicified exporters
181 20386 38415.0 1.9 0.0 graph_exporters.export_graph(graph, common_out_queue)
182
183 # Output statistics we gathered during processing
184 20386 10500.0 0.5 0.0 if kwargs["no_description"]:
185 entry_protein_desc = None
186 else:
187 20386 37338.0 1.8 0.0 entry_protein_desc = entry.description.split(";", 1)[0]
188 20386 37142.0 1.8 0.0 entry_protein_desc = entry_protein_desc[entry_protein_desc.index("=") + 1:]
189
190 40772 312422.0 7.7 0.0 graph_queue.put(
191 20386 12337.0 0.6 0.0 (
192 20386 11818.0 0.6 0.0 entry.accessions[0], # Protein Accesion
193 20386 10432.0 0.5 0.0 entry.entry_name, # Protein displayed name
194 20386 9196.0 0.5 0.0 num_isoforms, # Number of Isoforms
195 20386 9231.0 0.5 0.0 num_initm, # Number of Init_M (either 0 or 1)
196 20386 9244.0 0.5 0.0 num_signal, # Number of Signal Peptides used (either 0 or 1)
197 20386 9232.0 0.5 0.0 num_variant, # Number of Variants applied to this protein
198 20386 9227.0 0.5 0.0 num_mutagens, # Number of applied mutagens on the graph
199 20386 9231.0 0.5 0.0 num_conficts, # Number of applied conflicts on the graph
200 20386 9274.0 0.5 0.0 num_of_cleavages, # Number of cleavages (marked edges) this protein has
201 20386 9240.0 0.5 0.0 num_nodes, # Number of nodes for the Protein/Peptide Graph
202 20386 9269.0 0.5 0.0 num_edges, # Number of edges for the Protein/Peptide Graph
203 20386 9311.0 0.5 0.0 num_paths, # Possible (non repeating paths) to the end of a graph. (may conatin repeating peptides)
204 20386 9318.0 0.5 0.0 num_paths_miscleavages, # As num_paths, but binned to the number of miscleavages (by list idx, at 0)
205 20386 9288.0 0.5 0.0 num_paths_hops, # As num_paths, only that we bin by hops (E.G. useful for determine DFS or BFS depths)
206 20386 9363.0 0.5 0.0 num_paths_var, # Num paths of feture variant
207 20386 9519.0 0.5 0.0 num_path_mut, # Num paths of feture mutagen
208 20386 9508.0 0.5 0.0 num_path_con, # Num paths of feture conflict
209 20386 9476.0 0.5 0.0 entry_protein_desc, # Description name of the Protein (can be lenghty)
210 )
211 )
212
213 # Close exporters (maybe opened files, database connections, etc... )
214 1 13.0 13.0 0.0 graph_exporters.close()
Bottlenecks are:
Merge Aminoacids (~25%)
Apply Features (~29%)
Digestion (~43%)
The text was updated successfully, but these errors were encountered:
I ran a line profiler on the
generate_graph_consumer
method on the human_review dataset (20k proteins)It gives me the following output:
Bottlenecks are:
The text was updated successfully, but these errors were encountered: