Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include file stats when converting a parquet directory to a Delta table #2490

Closed
gruuya opened this issue May 8, 2024 · 0 comments · Fixed by #2491
Closed

Include file stats when converting a parquet directory to a Delta table #2490

gruuya opened this issue May 8, 2024 · 0 comments · Fixed by #2491
Labels
enhancement New feature or request

Comments

@gruuya
Copy link
Contributor

gruuya commented May 8, 2024

Description

Currently the ConvertToDeltaBuilder skips fetching and populating the stats

Add {
path: percent_decode_str(file.location.as_ref())
.decode_utf8()?
.to_string(),
size: i64::try_from(file.size)?,
partition_values: partition_values
.into_iter()
.map(|(k, v)| {
(
k,
if v.is_null() {
None
} else {
Some(v.serialize())
},
)
})
.collect(),
modification_time: file.last_modified.timestamp_millis(),
data_change: true,
..Default::default()
}

This results in log files missing the min/max/null count statistics.

Use Case

These stats are useful as they allow partition pruning and thus influence performance.

Granted it may be possible to use the stats from the files themselves, but that it is sub-optimal to reading from the log directly.

Related Issue(s)

@gruuya gruuya added the enhancement New feature or request label May 8, 2024
ion-elgreco pushed a commit that referenced this issue May 15, 2024
# Description
Collect stats during conversion of a parquet dir to a Delta table and
add to the actions.

# Related Issue(s)
Closes #2490 

# Documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant