Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any idea how to improve the performance when handling large ontologies? #44

Open
leonqli opened this issue Jun 13, 2018 · 7 comments
Open
Labels

Comments

@leonqli
Copy link

leonqli commented Jun 13, 2018

No description provided.

@lambdamusic
Copy link
Owner

Do you have any sample ontology in mind? I'm trying to wrap my head around this problem so it'd be useful to have some sample models for testing.

@leonqli
Copy link
Author

leonqli commented Dec 10, 2018

you may try with this ontology: http://bioportal.bioontology.org/ontologies/NCBITAXON

@lambdamusic
Copy link
Owner

The main problem is that ontospy attempts to build the entire ontology model in memory, and that takes time if there are many classes and properties to correlated.

I've tried using threads, but with no real performance improvements as the main tasks (extract classes, properties, concepts etc..) tend to be reliant on each other.

For very large ontologies maybe it's more indicated to use a triplestore. Otherwise I'm kind of out of ideas here..

@leonqli
Copy link
Author

leonqli commented Jan 3, 2019

You may want to take look of https://pythonhosted.org/Owlready2/ It seems to having better performance on large ontologies.

@lambdamusic
Copy link
Owner

lambdamusic commented Jan 3, 2019

Thanks! Looks like they use an ad-hoc back end, maybe that's it. Will look more into it
Update: the back end is an optimized SQLite index eg view here

@leonqli
Copy link
Author

leonqli commented Jan 10, 2019

Yes, they use SQLite as backend. Do you think it is helpful for improving the performance of ontospy?

@jclerman
Copy link

It's also not too difficult to load an ontology into Apache Fuseki Jena. The main issue is the non-Python dependency (Fuseki), but once the store is running it's easy to use rdflib to mediate querying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants