Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential optimization opportunities. #928

Open
CiaranWelsh opened this issue Dec 9, 2021 · 6 comments
Open

Potential optimization opportunities. #928

CiaranWelsh opened this issue Dec 9, 2021 · 6 comments

Comments

@CiaranWelsh
Copy link

CiaranWelsh commented Dec 9, 2021

A lot of time is spent searching the sbml tree for elements.

image

Sbml does a linear search over all elements with an sid (so virtually everything) so this is slow, particularly for larger models. This search is being repeated many times and so we have a several potential targets for optimization. In no particular order:

SetValueCodeGenBase.cpp

In this case, the lookup is performed for each species individually in order to extract the hasOnlySubstanceUnits attribute. We could instead store this value in LLVMModelSymbols (not LLVMModelDataSymbols) and not need to traverse the sbml tree at all.

GetInitValueCodeGenBase

Appears to be duplicated code - so similar solution to SetValueCodeGenBase.cpp.

LLVMModelSymbols.cpp

There are three sbml visit functions all of which can use a more specific sbml element locating mechanism then getElemenetBySId, since we know which element type we're looking for.

EvalRateRuleRatesCodeGen.cpp

Another search to find the hasOnlySubstanceUnits attribute.

EvalVolitileStoicCodeGen.cpp

Does a linear sbml lookup to find out whether an element is constant, this information could again be stored. If not then we can refine the search because we know which sbml type we are looking for.

GetValueCodeGenBase.cpp

Linear lookup for getHasOnlySubstanceUnits

GetInitialValueCodeGenBase.cpp

Another linear lookup for getHasOnlySubstanceUnits

SBMLValidator.cpp

Not completely clear what this is doing but its using getElementBySId twice, once within a while loop. I'm also not sure how often the validator is actually used, since model validation is only done when a model fails to load. Still a target for optimization however.

ConservedMoietyConverter.cpp

Only need to do the search over ListOfRules rather than all sbml elements.
Second usage for ListOfEvents

rrRoadRunner.cpp

Be more specific with our search

@hsauro
Copy link

hsauro commented Dec 9, 2021 via email

@matthiaskoenig
Copy link
Collaborator

Hi all,
The problem is the use of getElementBySId and getElementByMetaId functions which should be never called due to their O(n) runtime ! This functions sound like what you want, but they are much too slow. The correct/fast way is to create a HashMap for object lookup using getListOfAllElements and use the hashmap for lookup afterwards which is O(1).

I just opened an issue in libsbml, because I also tripped over this multiple times before.
sbmlteam/libsbml#187
@luciansmith If this could be fixed in libsbml it would speedup a ton of code using libsbml.

Best Matthias

@CiaranWelsh
Copy link
Author

Hi Matthias, thanks for the input. Yep - its better to do the lookup once and store in a hash map. However I've held off with implementing the fix for a while because I want to see whether some existing data structures can be augmented. For instance, some of the "SymbolResolver" types could be augmented to store the "hasOnlySubstanceUnits" and then in several instances of the need for getElementBySId are not needed at all.

@luciansmith
Copy link

Making this change in libsbml is more complicated, because the document is editable. You would then have to keep the hashmap up to date as you added or removed objects, and even then, you'd have problems because it's easy while editing SBML to have a temporarily invalid model with two objects with the same SId. Ideally, you wouldn't create the hashmap until you were done editing the document, but I'm not sure libsbml itself could possibly know when that is.

@matthiaskoenig
Copy link
Collaborator

@luciansmith Good point. I thought it is easy to keep track of needed updates by just listening to setId() and setMetaId() of SBase. I.e. every time an SId is changed the HashMap is updated. Yes, there can be key collisions and only the last set SId would be returned (similar to the first SId in getElementBySId). The setId function is the only function changing the hash map so should be a localized addition.

Alternatively a solution could be to provide two new helper functions getElementBySIdMap and getElementByMetaIdMap. These would use the lists generated by getAllElementIdList with the same limitations, i.e., a call to populateAllElementIdList would be necessary to update the hashmap.

@CiaranWelsh
Copy link
Author

This has partly been implemented in release v2.2.0. Instead of searching through all sbml elements we make effort to restrict to only species or only compartments, depending on what is needed. I'm convinced we can still improve on this however and so will leave this issue open for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants