Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which database format should store primary data? #9

Open
jchodera opened this issue Oct 18, 2014 · 2 comments
Open

Which database format should store primary data? #9

jchodera opened this issue Oct 18, 2014 · 2 comments

Comments

@jchodera
Copy link
Contributor

Currently, there is a Python pickle file that stores both primary data and derived data. This is very convenient for Python, but less convenient for anything that is not Python.

I wonder if we want to keep just the primary data in a nice, portable, small file from which everything (including convenient Python pickles) is derived. But what format should this be?

  • Python pickle (still not super convenient)
  • JSON?
  • XML?
  • SQLite?

As a reminder, we decided the primary data consisted of the following:

  • canonical isomeric SMILES
  • experimental data:
    • experimental value
    • experimental uncertainty
    • citation for experimental data
  • notes field

Eventually, it would be great if there was also more provenance data for the experimental value (e.g. if Peter Guthrie had computed it from combining data from multiple publications and applying a conversion) but this is a more advanced topic.

@davidlmobley
Copy link
Member

I’m OK with it being in some other format, as long as it is an easy format to work with. XML is the only one of those that I have even a passing familiarity with. Basically, whatever format we pick, my big request is to make sure that there is a good Python library for working with it, as I don’t want to have to learn something else to manage this. :) 

@jchodera
Copy link
Contributor Author

Thanks, @davidlmobley!

Any other opinions here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants