Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: better way to determine fetch_size #505

Open
skyzh opened this issue Feb 19, 2022 · 5 comments
Open

storage: better way to determine fetch_size #505

skyzh opened this issue Feb 19, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@skyzh
Copy link
Member

skyzh commented Feb 19, 2022

Currently, the logic is like:

let mut fetch_size = {
// We find the minimum fetch hints from the column iterators first
let mut min = None;
for it in &self.column_iterators {
let hint = it.fetch_hint();
if hint != 0 {
if min.is_none() {
min = Some(hint);
} else {
min = Some(min.unwrap().min(hint));
}
}
}
min.unwrap_or(ROWSET_MAX_OUTPUT)
};

If there is any hint from column iterators, we choose the minimum of remaining items N, so that we can issue as few I/O as possible for this round. This number N can be:

  • very large (e.g., >= 200k)
  • not accurate (e.g., there could be columns with remaining_items == 0, and they have to issue I/O for this round. We cannot determine the best fetch_size for now.)
  • not limited by ROWSET_MAX_OUTPUT. This value will only be used when there's no hint from any of the columns. The naming of this const is somehow misleading 🤪

We should find a better way to determine fetch_size. e.g., by removing fetch_hint interface and solely base on the column index information.

@skyzh skyzh added the enhancement New feature or request label Feb 19, 2022
@eliasyaoyc
Copy link
Contributor

My idea is that a fetch size vec corresponds to the fetch size for each column. if a certain size is 0, it will be skipped directly, and no io is needed. 🥺

Is that okay with you? @skyzh

@skyzh
Copy link
Member Author

skyzh commented Oct 14, 2022

Could you please elaborate it in more details? Note that we must fetch equal size of rows for all columns.

@eliasyaoyc
Copy link
Contributor

old -> [col1_fetch_size= 15, col2_fetch_size= 10, col3_fetch_size= 0] will use ROWSET_MAX_OUTPUT directly.
my opinion is: [col1_fetch_size= 15, col2_fetch_size= 10, col3_fetch_size= 0] -> [col1_fetch_size= 10, col2_fetch_size= 10, col3_fetch_size= 10] so we get the minimum value is 10 and compare ROWSET_MAX_OUTPUT

@skyzh
Copy link
Member Author

skyzh commented Oct 15, 2022

my opinion is: [col1_fetch_size= 15, col2_fetch_size= 10, col3_fetch_size= 0] -> [col1_fetch_size= 10, col2_fetch_size= 10, col3_fetch_size= 10]

That's exactly what I want! Feel free to send a PR. If all of them are 0 (or None), we can use ROWSET_MAX_OUTPUT directly.

@eliasyaoyc
Copy link
Contributor

/assignme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants