New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build arrow records without converting to parquet for sorting #4299
base: main
Are you sure you want to change the base?
Conversation
✅ Meticulous spotted zero visual differences across 403 screens tested: view results. Last updated for commit 04a59f2. This comment will update as new commits are pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! 🥳 I've been looking forward to this day! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell you how happy I am to see this
return nil, err | ||
sortingColumns := []arrowutils.SortingColumn{} | ||
arrowFields := as.Fields() | ||
for _, col := range schema.SortingColumns() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid constructing this sorting schema on every record in the happy case? This might not be worth optimizing at this stage especially since this is already likely a huge improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I saw both of these optimization opportunities as well, but they don't appear worth it at the moment.
panic(fmt.Sprintf("unknown column %v", as.Field(i).Name)) | ||
} | ||
if colDef.Dynamic { | ||
for i, c := range arrowFields { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of annoying to have to iterate over all fields for each dynamic column. What if we created a map with the field prefix so this can instead be an O(1) lookup? Again, not sure if worth optimizing.
Anything specific this is blocked on? |
A benchmark suggested that this was significantly slower than the previous approach. |
No description provided.