Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Csv datastore simplification #1043

Merged
merged 4 commits into from
May 15, 2024

Conversation

calina-c
Copy link
Contributor

@calina-c calina-c commented May 15, 2024

To follow-up on the booleans issue (ref #1015 and maybe #973)

Added rechunks and simplified data management for identifiers. Adds utility to create CSV identifiers from tables directly, but keep CSVDataStore for inspection purposes as well.

@calina-c calina-c changed the base branch from main to issue685-duckdb-integration May 15, 2024 09:03
@calina-c calina-c self-assigned this May 15, 2024
@calina-c calina-c marked this pull request as ready for review May 15, 2024 10:01
Copy link
Member

@KatunaNorbert KatunaNorbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@calina-c
Copy link
Contributor Author

Added the last commit based on suggestions from Mustafa. @kdetry if it looks good to you without the instances part, there was also no point in keeping the high level data store, the new identifiers manage just fine.

@idiom-bytes
Copy link
Member

idiom-bytes commented May 15, 2024

Hi, thanks for simplifying the process such that CSV signatures can be built from tables.

But, why did the class have to be renamed?

Not only that doesn't seem needed, but it's unfortunate... I've been trying to deprecate the name "dataset_identifier" for a long time... Now I'm seeing it used even more... We should just call it a key, signature or table_name which suggests an unique value, and avoid creating new terminology unless needed.

@idiom-bytes idiom-bytes merged commit 33a40e9 into issue685-duckdb-integration May 15, 2024
4 checks passed
@idiom-bytes idiom-bytes deleted the csv-ds-simplification branch May 15, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants