Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support primary key #266

Open
1 of 4 tasks
nautaa opened this issue Sep 10, 2021 · 0 comments
Open
1 of 4 tasks

support primary key #266

nautaa opened this issue Sep 10, 2021 · 0 comments

Comments

@nautaa
Copy link
Contributor

nautaa commented Sep 10, 2021

The primary key plan supports data deduplication.

First of all, the primary key on a single column is supported. When data is inserted, judge whether the row appears in the table by primary key. If there is already row with the same primary key, the insertion is skipped.

The initial plan is to achieve deduplication by maintaining a deduplication container for each table. When the database is restarted, the primary key column is read from the disk and container in memory is rebuilt.

After investigation, roaring bitmap is a compressed bitmap index with excellent performance and less memory usage.

We can use RoaringBitmap and RoaringTreemap in roaring-rs to store ordinary integer primary keys. For string types that cannot be supported by roaring bitmap, we can use HashSet storage.

Also, where can the deduplication container of each table be placed appropriately, can it be placed in the MetaStore?

  • sql parse
  • deduplication by primary key when data inserting
  • recovery
  • performance test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant