Skip to content
This repository has been archived by the owner on Nov 1, 2021. It is now read-only.

many-examples: remove kaggle dependency #544

Open
4 tasks
alexcg1 opened this issue May 4, 2021 · 4 comments
Open
4 tasks

many-examples: remove kaggle dependency #544

alexcg1 opened this issue May 4, 2021 · 4 comments
Assignees
Labels
area/examples This ticket affects an example good first issue Good for newcomers

Comments

@alexcg1
Copy link
Member

alexcg1 commented May 4, 2021

As discussed in various meetings with @lusloher , @aga11313 , @FionnD

Kaggle is a lot of hoops for a user to jump through just to get an example working: install, set up key, run data getter script.

It's also work for us: We have to ensure datasets haven't moved or changed a lot, and we sometimes have to perform extra steps to process them.

These datasets are generally under creative commons licenses or similar. There's no reason why we can't:

  • Download a subset for example purposes (this keeps things light)
  • Process that subset ourselves (saves users time and effort)
  • Store it either in data/ (for light stuff like text which can go directly in repo) or use get_data.sh to download from somewhere we control (for larger stuff like images)

Affected examples

  • wikipedia-sentences
  • multires-lyrics-search
  • cross-modal-search
  • query-while-indexing
@alexcg1 alexcg1 added good first issue Good for newcomers area/examples This ticket affects an example labels May 4, 2021
@FionnD
Copy link
Contributor

FionnD commented May 4, 2021

Thanks for creating the issue Alex!

Just to clarify to any engineer. ⚠️This issue should not be worked until #447 and #512 are completed. ⚠️

@nan-wang
Copy link
Member

audio-search has no longer dependency on kaggle

@jakobkruse1
Copy link
Contributor

Where could we store the example data? Do we have "somewhere we control" to download from?

@tadejsv
Copy link
Contributor

tadejsv commented Aug 5, 2021

I propose to use, when possible, huggingface datasets. They are extremely easy to use, and very performant too.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/examples This ticket affects an example good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants