Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make case ids shorter and easier to read #2973

Open
maciej-zarzeczny opened this issue Mar 9, 2023 · 2 comments
Open

Make case ids shorter and easier to read #2973

maciej-zarzeczny opened this issue Mar 9, 2023 · 2 comments
Assignees
Labels
Data UI Bug is related to Data frontend functionality turnkey

Comments

@maciej-zarzeczny
Copy link
Contributor

maciej-zarzeczny commented Mar 9, 2023

Currently cases are identified by default MongoDB IDs. We should make them shorter and easier to read for curators.

@maciej-zarzeczny maciej-zarzeczny added Data UI Bug is related to Data frontend functionality turnkey labels Mar 9, 2023
@abhidg
Copy link
Contributor

abhidg commented Mar 15, 2023

One approach is to keep the ObjectIds as is in the DB, but use a frontend function to make it more readable. Any reduction in information could lead to more collisions, but I think this scheme will make it extremely unlikely:

Number in three parts, separated by hyphens, obtained from the timestamp embedded in ObjectId:

  • Number of days since outbreak (should be well-defined as we have OUTBREAK_DATE)
  • Number of seconds elapsed on the day in the timestamp, integer divided by 100, so this will go from 0 to 864
  • Last two bytes (0 - 65536) from incrementing number in the last block of ObjectId (A 3-byte incrementing counter, initialized to a random value)

Assuming an outbreak lasts upto 1000 days (3 years, which would be a pandemic, and thus unlikely to happen frequently), this would give a maximum number of digits as 11, while not touching the DB at all. In most cases, curators working on a single day’s cases would only need the second two bits as the number of days since outbreak would be the same.

This is assuming numerical IDs only, if we can do alphanumeric, we can shorten further by using hex or by using one of several naming systems such as https://pypi.org/project/human-id/ mapping UUIDs to a string of words; disadvantage is that alphanumeric systems usually lack monotonicity.

@maciej-zarzeczny
Copy link
Contributor Author

@abhidg Those are all great ideas! I think it all depends on Curator's preferences. @aimeehan1 is there any solution that works for you better than the other?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data UI Bug is related to Data frontend functionality turnkey
Projects
None yet
Development

No branches or pull requests

3 participants