New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated port names should break existing tests #142
Comments
An additional comment on the usability of the new approach. In def _addSomething(self, other, key, IdText=None):
if key not in self:
self[key] = []
if other["oid"] not in self[key]:
append = {other["oid"]:IdText} if IdText else other["oid"]
self[key].append(append) This leads to the following competing structures, previous and updated: # Previous
"producers": ["1_-13_0/0/0/0", "1_-15_0/0/0/0"]
# Updated
"producers": [{"1_-13_0/0/0/0": "event"}, {"1_-15_0/0/0/0": "event"},"..."] If one wants to iterate over all "producers" in the previous example, one can do the following: for producer in producers:
nx.graph.nodes(producer) # or something like this However, the new approach requires the following: for producer in producers:
prod_keys = producer.keys() # create a list of size 1
nx.graph.nodes(prod_keys[0]) This is costly and makes the code much more obfuscated. |
An update here is that this leads to a serious performance hit: iteration through ~8000 node graph that now takes 80+ seconds. This is because not everything in the for uentry in element['producers']:
try:
uentry.keys()
except AttributeError:
u = uentry
else:
u = list(uentry.keys())[0] The data coupling here is odd, too, as evidenced by the requirement to find retrieve the "producers":[{"id": "1_-13_0/0/0/0", "type:"event"}, {"id":"1_-15_0/0/0/0","type":"event"},...] This would make ID access O(1), and it also makes it explicit that there is only 1 producer ID for each "producer" element (the same goes for |
just did a test using the following:
and then run
i.e. kv1 is a list of dictionaries with the same structure as we are using now. I'm not quite sure where the issue of not having always a dictionary in the producers list is coming from, out of my head, but anyway at least for this, I don't see a performance hit at all. What I really would like to have is a dictionary of dictionaries of the form:
|
Apologies - the performance hit was mostly due to an There is still a definite performance hit because it is not just 8,000 nodes, but traversing edges (so north of 10000/a nested for loop). The following has a x10 performance hit between approaches: for element in dlg_json_dict:
if ('storage' in element.keys()) and (
'producers' in element and 'consumers' in element
):
for uentry in element['producers']:
a = uentry
for ventry in element['consumers']:
b = ventry
# runtime=0.015253782272338867 for element in dlg_json_dict:
if ('storage' in element.keys()) and (
'producers' in element and 'consumers' in element
):
for uentry in element['producers']:
try:
uentry.keys()
except AttributeError:
u = uentry
else:
u = list(uentry.keys())[0]
for ventry in element['consumers']:
try:
ventry.keys()
except AttributeError:
v = ventry
else:
v = list(ventry.keys())[0]
# runtime=0.1401374340057373 The runtime is much more negligent overall, but having to check for different types does lead to a performance degradation. The previous comments describe how append = {other["oid"]:IdText} if IdText else other["oid"] It looks like |
I'm not familiar with the whole story here, but you can replace |
I kept the original list because of the backwards compatibility also with
the tests. But will have a look into this again. The point I'm not sure
about is why you sem to be getting a mixed list, with both dictionaries and
just IDs as elements. That really should not happen. The code you are
referring to suggests that it might be possible, but if the system is
driven from a graph, the list should only contain the dictionaries.
Also the list comprehension I've used with the if else statement is less
expensive than relying on a try exception block. The next() call for
dict_keys does not work because these are not iterators. However, you can
ommit the list conversion completely, if you are using list comprehension:
kkeys = [k.keys() if isinstance(k, dict) for k in kv]
…On Wed., 4 May 2022, 17:58 rtobar, ***@***.***> wrote:
I'm not familiar with the whole story here, but you can replace
list(x.keys())[0] by next(x.keys()) to save creating a list each time,
hopefully reducing your runtime costs.
—
Reply to this email directly, view it on GitHub
<#142 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJVMSDUNVAKJ5HQL5DHFBDVIJC4NANCNFSM5TXRPVEQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Log message formatting should be left to the logging system instead of being carried out by users; this is because in many cases, specially in the lower logging levels like DEBUG, messages will actually be filtered out and never make it to any of the sinks. This is sadly a very common mistake one sees in the wild, and can cause real performance issues. These might have been the slow logging reported in #142, but even if it isn't the change is beneficial. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Log message formatting should be left to the logging system instead of being carried out by users; this is because in many cases, specially in the lower logging levels like DEBUG, messages will actually be filtered out and never make it to any of the sinks. This is sadly a very common mistake one sees in the wild, and can cause real performance issues. These might have been the slow logging reported in #142, but even if it isn't the change is beneficial. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Log message formatting should be left to the logging system instead of being carried out by users; this is because in many cases, specially in the lower logging levels like DEBUG, messages will actually be filtered out and never make it to any of the sinks. This is sadly a very common mistake one sees in the wild, and can cause real performance issues. These might have been the slow logging reported in #142, but even if it isn't the change is beneficial. Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
The change to 'port names' described in 01d11f4 should be updated in test cases, but existing test cases pass. This demonstrates a defect in the coupling and test management of the translator.
The previous result in translating from a Logical Graph (LG) to a Physical Graph Template (PGT) led to:
Now the approach is:
This appears to pass all tests in the translator, but this is because
test_pg_gen.py
does not test the output of the translation. In fact, most of the tests do not even runassert
, meaning all they do is confirm that no runtime errors occur.The reason for this failure is that the test cases use older graphs (
daliuge-engine/test/graphs
). The only places I can see where"consumers"/"producers"
are invoked are in thedaliuge-engine
code base (e.g.test_dm.py
). However, these drops are created as dictionaries rather than created as mocked drop objects based ondropdict
indaliuge-common
. If there is a change to a foundational class like this, it should be reflected in failing tests across the codebases.For example, an example hard-coded reflection of
dropdict
intest_dm.py
passes just fine:However, these tests should fail given that they use a now-outdated approach to managing "ports".
(As an aside, is it possible to entertain the idea of changing the collective term for
producers
andconsumers
"ports"? Given the codebase also contains network and server operations in which ports have a functionally different meaning, it seems like a better naming convention could be devised.)The text was updated successfully, but these errors were encountered: