Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build arrow records without converting to parquet for sorting #4299

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

brancz
Copy link
Member

@brancz brancz commented Feb 7, 2024

No description provided.

Copy link

alwaysmeticulous bot commented Feb 7, 2024

✅ Meticulous spotted zero visual differences across 403 screens tested: view results.

Last updated for commit 04a59f2. This comment will update as new commits are pushed.

Copy link
Member

@metalmatze metalmatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! 🥳 I've been looking forward to this day! 🎉

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell you how happy I am to see this

return nil, err
sortingColumns := []arrowutils.SortingColumn{}
arrowFields := as.Fields()
for _, col := range schema.SortingColumns() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid constructing this sorting schema on every record in the happy case? This might not be worth optimizing at this stage especially since this is already likely a huge improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I saw both of these optimization opportunities as well, but they don't appear worth it at the moment.

panic(fmt.Sprintf("unknown column %v", as.Field(i).Name))
}
if colDef.Dynamic {
for i, c := range arrowFields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of annoying to have to iterate over all fields for each dynamic column. What if we created a map with the field prefix so this can instead be an O(1) lookup? Again, not sure if worth optimizing.

@metalmatze
Copy link
Member

Anything specific this is blocked on?

@brancz
Copy link
Member Author

brancz commented Apr 11, 2024

A benchmark suggested that this was significantly slower than the previous approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants