Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Homogenization of data structures and physical representations #104

Open
1 of 6 tasks
laserkelvin opened this issue Jan 20, 2024 · 1 comment
Open
1 of 6 tasks
Assignees
Labels
code maintenance Issue/PR for refactors, code clean up, etc. data Issues related to data loading, pipelining, etc. good first issue Good for newcomers

Comments

@laserkelvin
Copy link
Collaborator

Feature/behavior summary

To ensure consistency in modeling, each dataset in Open MatSciML Toolkit should have uniform (or near uniform) kinds of data. For example, whether coordinates provided are fractional or Cartesian, ensuring every dataset has sufficient information to represent each data sample in a physically meaningful way, such as periodic boundary conditions (for use in e.g. shift vectors).

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?
  • Does this proposal include a new model?
  • Does this proposal include a new dataset?
  • Does this proposal include a new task/workflow?

Related issues

No response

Solution description

A good place to start would be to make sure each devset, and subsequently any serialized datasets we have conform to the following:

  1. Check if the coordinates are fractional or not (if there are values outside of 0 and 1 then they're likely Cartesian)
  2. Check to make sure we have enough information to create a Lattice object, can be just a cell key, or have the lattice parameters like materials project
  3. Generally just print and list out the keys in the sample, construct a table of them, so that we can help contribute to [Feature request]: Standardized data structure for datasets #97

We should also check other projects, like Colabfit, to see what extent we can try and conform to community standards, too.

Additional notes

Can't assign Bin yet, but would be good for Bin to aggregate information, and between him and @melo-gonzo to help craft PRs to address things after the survey is done.

@laserkelvin laserkelvin added good first issue Good for newcomers data Issues related to data loading, pipelining, etc. code maintenance Issue/PR for refactors, code clean up, etc. labels Jan 20, 2024
@bmuaz
Copy link

bmuaz commented Jan 23, 2024

I had the same thoughts about the data structures and will be happy to work on it with Carmelo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code maintenance Issue/PR for refactors, code clean up, etc. data Issues related to data loading, pipelining, etc. good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants