Skip to content

すかすかアニメボカロデータセット。1st anime vocal dataset. Extract audio (vocal) files from video based on .ass subtitle files; manually label vocal files to characters. Will be used for PITS/VITS/Diffusion text-to-speech/SVC. 根据字幕,从视频里抽取全部语音,然后手动按角色标注。

License

Notifications You must be signed in to change notification settings

Hecate2/sukasuka-vocal-dataset-builder

Repository files navigation

My Python codes in this repo are licensed in MIT. Be aware that the anime & subtitles & Python packages (e.g. ffmpeg) may have other licenses.

Salute to all the contributors!

Episodes 09 & 10 labeled by 亡絮开始·祖安钢琴师

Episodes 11 & 12 labeled by 喵る桑

Drama CD 01 subtitled & labeled by camimo

Experimental synthesis (see the .mp3 & .flac files in the release) and model training performed by Aya.

TTS model using ESPnet by mio.

Dataset of Chtholly checked by mio; Ithea checked by camimo.

If you are going to train your own model, pay attention that the dataset is further cleaned and released by mio at huggingface.co to remove non-vocal sounds, using demucs. My releases here STILL INCLUDES NON-VOCAL SOUNDS.

contributions-banner

(Image created by Carzit using AI)

Contribution guides for potential Chthollists: Following Tasks!

All kinds of contributions from anyone are welcomed, while a perfectly ideal contributor needs to:

  • [THIS IS THE MOST IMPORTANT!] be familiar with SukaSuka characters, especially the sounds and personalities! At least you need to know their names... (head to releases to check the English names)
  • understand how AI models are trained, and why and how we are building datasets
  • know something about .csv, or other text-only formats like .json that are designed for both humans and machines
  • know about github, huggingface, civitai, etc.
  • be able to read or write basic programs
  • be familiar with AI-ops

Please always fire an issue mentioning what you are going to do before contributing, in case others may repeat (or have already repeated) your work for many times, wasting labor forces.

  • Verify meta.csv. Surely there are mistakes.
  • Filter out non-vocal sounds in the dataset
  • Mark vocal sounds that are not suitable for training, in meta.csv. This requires some training experience. For example, short and meaningless ああああ~ running away from the character's normal pitch may pollute the model.

How to build your dataset

Place your files like this

sukasuka-vocal-dataset-builder:
  get_voice_from_video_and_subtitles.py
  divide_by_character.py
  (Others...)
[MH&Airota&FZSD&VCB-Studio] Shuumatsu Nani Shitemasuka? Isogashii Desuka? Sukutte Moratte Ii Desuka? [Ma10p_1080p]:
  [MH&Airota&FZSD&VCB-Studio] sukasuka [01][Ma10p_1080p][x265_flac_aac].mkv
  (Others...)
[XKsub] 終末なにしてますか [简日·繁日双语字幕]:
  [XKsub] 終末なにしてますか chs_jap:
    Shuumatsu Nani Shitemasuka 01.chs_jap.ass
    (Others...)

Run get_voice_from_video_and_subtitles.py, and then MANUALLY label all the characters in sukasuka-vocal-dataset-builder/meta.csv (format: filename,character,content; check if your csv file has the exact first line filename,character,content). Finally run divide_by_character.py.

Drama CD dataset...

WIP. If you are interested, run drama_cd_transcript.py, and manually edit drama-cd-transcript/drama-cd-transcript.csv.

Data sources

subtititles: https://bbs.acgrip.com/thread-6124-1-1.html (with AGPLv3 & CC BY-NC-SA 4.0 licenses)

anime videos: magnet:?xt=urn:btih:a05ba5cf6182e0757288c377fe8c06606a0f6428&dn=%5bMH%26Airota%26FZSD%26VCB-Studio%5d%20Shuumatsu%20Nani%20Shitemasuka%ef%bc%9f%20Isogashii%20Desuka%ef%bc%9f%20Sukutte%20Moratte%20Ii%20Desuka%ef%bc%9f%20%5bMa10p_1080p%5d&tr=udp%3a%2f%2ftracker.publicbt.com%3a80%2fannounce&tr=http%3a%2f%2ftr.bangumi.moe%3a6969%2fannounce&tr=http%3a%2f%2ft.nyaatracker.com%2fannounce&tr=http%3a%2f%2fopen.acgtracker.com%3a1096%2fannounce&tr=http%3a%2f%2fopen.nyaatorrents.info%3a6544%2fannounce&tr=http%3a%2f%2ft2.popgo.org%3a7456%2fannonce&tr=http%3a%2f%2fshare.camoe.cn%3a8080%2fannounce&tr=http%3a%2f%2fopentracker.acgnx.se%2fannounce&tr=http%3a%2f%2ftracker.acgnx.se%2fannounce&tr=http%3a%2f%2fnyaa.tracker.wf%3a7777%2fannounce&tr=udp%3a%2f%2ftracker.openbittorrent.com%3a80%2fannounce&tr=http%3a%2f%2ft.acg.rip%3a6699%2fannounce&tr=udp%3a%2f%2ftracker.prq.to%3a80%2fannounce&tr=http%3a%2f%2fshare.dmhy.org%2fannonuce&tr=http%3a%2f%2ftracker.btcake.com%2fannounce&tr=http%3a%2f%2ftracker.ktxp.com%3a6868%2fannounce&tr=http%3a%2f%2ftracker.ktxp.com%3a7070%2fannounce&tr=udp%3a%2f%2fbt.sc-ol.com%3a2710%2fannounce&tr=http%3a%2f%2fbtfile.sdo.com%3a6961%2fannounce&tr=https%3a%2f%2ft-115.rhcloud.com%2fonly_for_ylbud&tr=http%3a%2f%2fexodus.desync.com%3a6969%2fannounce&tr=udp%3a%2f%2fcoppersurfer.tk%3a6969%2fannounce&tr=http%3a%2f%2ftracker3.torrentino.com%2fannounce&tr=http%3a%2f%2ftracker2.torrentino.com%2fannounce&tr=udp%3a%2f%2fopen.demonii.com%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.ex.ua%3a80%2fannounce&tr=http%3a%2f%2fpubt.net%3a2710%2fannounce&tr=http%3a%2f%2ftracker.tfile.me%2fannounce&tr=http%3a%2f%2fbigfoot1942.sektori.org%3a6969%2fannounce&tr=http%3a%2f%2fbt.sc-ol.com%3a2710%2fannounce

About

すかすかアニメボカロデータセット。1st anime vocal dataset. Extract audio (vocal) files from video based on .ass subtitle files; manually label vocal files to characters. Will be used for PITS/VITS/Diffusion text-to-speech/SVC. 根据字幕,从视频里抽取全部语音,然后手动按角色标注。

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages