Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Kaggle dataset support #214

Open
NeroBlackstone opened this issue Jul 18, 2023 · 2 comments
Open

Feature request: Kaggle dataset support #214

NeroBlackstone opened this issue Jul 18, 2023 · 2 comments

Comments

@NeroBlackstone
Copy link

Kaggle supplies many datasets, most are in CSV format.

Does adding the feature of directly downloading Kaggle datasets in MLDatasets.jl make any sense?

For example, to download House Prices 2023 Dataset:

Step1: Get kaggle.json file or set the username and key manually.

username = "neroblackstone"
key = "key"

or download keggle.json to ~/.kaggle/

Step2: Download

# download dataset to default path and extract csv.
files_path = keggle_download("howisusmanali/house-prices-2023-dataset")

Step3: Processing

using CSV
using DataFrames

file_path = joinpath(files_path,"csv_we_want.csv")
data = CSV.read(open(file_path),DataFrame)

Implementation:

  • Pycall KaggleAPI, a little heavy
  • Or use Julia to request Kaggle rest API, this is more lightweight but a bit harder to implement.

What's your thought, do you think this feature makes sense?
I can implement this by myself and make a PR.

@CarloLucibello
Copy link
Member

This would be great to have! We have to go with the rest api though, so far we managed to avoid the pycall dependency.

@NeroBlackstone
Copy link
Author

NeroBlackstone commented Sep 23, 2023

I'm trying to implement the complete keggle api in Julia.

Since Python's kaggle api is generated using the openapi specification. I also want to use openapi.jl to generate julia kaggle api.

However, openapi.jl does not have full support for file downloads. If this feature implemented, I will continue working on kaggle.jl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants