Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

academic torrent and availability of missing files #21

Open
benearnthof opened this issue Dec 3, 2023 · 11 comments
Open

academic torrent and availability of missing files #21

benearnthof opened this issue Dec 3, 2023 · 11 comments

Comments

@benearnthof
Copy link

Hello and thank you for this extremely useful repository!

I'm currently downloading all of the raw files to compile a .torrent to make this dataset available at https://academictorrents.com/ and truly future proof this work and was wondering if the missing files (videos that have been deleted etc.) are available for download somewhere. I've read through #8 but am unable to use the chinese download provider there and also don't believe the raw files are hosted on there.

Could you provide raw files or link to a backup somewhere?

I'll compile the .torrent with raw and processed files in the coming week but my download is slow and may take another 4-5 days.

Please let me know if the missing files are available anywhere else apart from #8 Best regards.

@benearnthof
Copy link
Author

I've created a complete academic torrent here:
https://academictorrents.com/details/843b5adb0358124d388c4e9836654c246b988ff4 and will be seeding this for the next year. Please mention me should you be unable to download the torrent and I will reseed if needed. Upload is confirmed working by the public Academic Torrent mirror in Florida. My rate is limited to 60mbit/second and I will only seed at night starting 22pm Central European Time until 6am the following day.

@fenghe12
Copy link

Thanks! but i can't download from your onedrive link .Is there anything wrong with it?

@johndpope
Copy link

magnet:?xt=urn:btih:843b5adb0358124d388c4e9836654c246b988ff4&dn=CelebV-HQ&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=https%3A%2F%2Fipv6.academictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce try this @fenghe12 - downloaded fine the other day. fyi - I'm doing some ground work for EMO paper - (slow progress) https://github.com/johndpope/Emote-hack

@fenghe12
Copy link

ok thanks a lot! By the way,i'm also studying emo,maybe I can learn from your repo

@aixiaodewugege
Copy link

Thanks for your link. But some videos don't have audio.... Anybody notice that?

@triton99
Copy link

Hi @johndpope ,
I downloaded it and found that some videos don't have audio (like _CyaF_EymBU_38_0). Can you recheck it?
Thank you.

@johndpope
Copy link

hi @triton99 - it takes me a few days - i'm away on holiday. in mean time
@francqz31 dug up celebvox2 -
johndpope/VASA-1-hack#5 (comment)

the way to fix no audio - is to create an errata gist - and list all the videos - i saw this done on another dataset.

@triton99
Copy link

Hi @johndpope ,

I found that it has 6896/35666 (=19.34%) videos that don't have audio. Did you mean we remove non-audio videos from training and validation? And could you show me which dataset or project that did this way?

Thank you very much for the information.

@johndpope
Copy link

johndpope commented Apr 27, 2024

mdeff/fma#70

@triton99 - throw list into a gist
. i susepect it maybe due to licensing of music. it maybe possible to redownload these files 6896.

@triton99
Copy link

@johndpope . Here is list of the non-audio videos. Can you show me how to redownload it with the audio? I don't understand how to throw list into a gist. Thank you for your help.

no_audio_files.txt

@johndpope
Copy link

johndpope commented Apr 27, 2024

Gist.github.com
Each file name is extension for YouTube

YouTube.com/v=filename

if you use yt-dlp you can just pass the code in

https://github.com/yt-dlp/yt-dlp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants