Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(scheduler): add csv support for get_file_metadata grpc method #1011

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

etolbakov
Copy link
Contributor

Which issue does this PR close?

Closes #.
I was exploring the code and came across the TODO about the csv file format support. So decided to address it.

Rationale for this change

The functionality extension.

What changes are included in this PR?

  • add csv / tlb file format support
  • fix minor typos

Are there any user-facing changes?

No

@@ -297,13 +298,24 @@ impl<T: 'static + AsLogicalPlan, U: 'static + AsExecutionPlan> SchedulerGrpc
// TODO shouldn't this take a ListingOption object as input?

let GetFileMetadataParams { path, file_type } = request.into_inner();
let file_format: Arc<dyn FileFormat> = match file_type.as_str() {
let file_format: Result<Arc<dyn FileFormat>, Status> = match file_type.as_str() {
"parquet" => Ok(Arc::new(ParquetFormat::default())),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that in the benchmark there's slightly different default for parquet:

ParquetFormat::default().with_enable_pruning(Some(true))

I wonder if we need consistency or it's ok to leave it as is?

_ => Err(tonic::Status::unimplemented(
"get_file_metadata unsupported file type",
)),
}?;
Copy link
Contributor Author

@etolbakov etolbakov Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to drop the "short circuit" from here to resolve the rust compilation error:

Type mismatch [E0308] expected `Result<Arc<CsvFormat>, Status>`, but found `Result<Arc<ParquetFormat>, Status>` 

please let me know if there's a better way of doing it?

@etolbakov
Copy link
Contributor Author

etolbakov commented Apr 29, 2024

@andygrove
Could you please take a look at this change when you have time?
Provided that there were no open ticket (but only todo in the code) I understand that maybe this change is of no use.

UPD: I will resolve the failing checks if we decide to proceed with this PR.

@etolbakov
Copy link
Contributor Author

@andygrove just a ping in case this notification is lost.
Could you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant