-
-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOT and GML format writers are extremely slow when there are many attributes #2567
Comments
My hunch is that the issue is the constant memory (re)allocations in |
Okay, this bothers me and it's a fuzzing issue so let's try to tackle it. I'll make a quick patch with an internal "resizable string" data type and re-write |
…e memory all the time when escaping strings, refs #2567
Okay, check out the commit above. it rewrites |
Okay, but then why is this not a problem for GraphML? The same construct is being used there (i.e. loop over all vertices/edges, then within each vertex/edge loop over all attributes and look them up with |
I was thinking about the same too last night, but I was too tired. It is possible that the string attribute values have characters that are forbidden in XML, and the GraphML writer fails early. See the code here: igraph/fuzzing/write_all_gml.cpp Lines 88 to 99 in 4dcd230
|
@ntamas So the problem is that attribute lookup uses a dumb linear search in the C attribute handler. Does this affect the Python attribute handler? @krlmlr Do you know if this affects the R attribute handler? If only C is affected, I propose that we defer this to 2.0, as that's when the big attribute handling refactoring will happen (assuming we get funding for it...) |
It's only the C attribute handler, which, in its current state, is yet another example of a temporary solution that turned out to be permanent :-D It was never meant to be performant, we just needed something simple to enable us to use attributes from C in unit tests, and then things escalated quickly. Neither the R nor the Python attribute handler are affected. The Python attribute handler uses Python dicts. |
The DOT format writer is surprisingly slow, causing timeouts in the
write_all_gml
andwrite_all_graphml
fuzz targets. Originally, I believed that the performance issues were due to Flex (which is also an existing problem!), but it seems this is a different thing. Commenting this out makes the problem go away:igraph/fuzzing/write_all_gml.cpp
Lines 63 to 66 in e23c746
The corresponding OSS-Fuzz issue, with test cases, is at:
The text was updated successfully, but these errors were encountered: