Skip to content
View Rongjiehuang's full-sized avatar
🎯
Focusing. I may be slow to reply.
🎯
Focusing. I may be slow to reply.

Organizations

@AIGC-Audio
Block or Report

Block or report Rongjiehuang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Rongjiehuang/README.md

Hi there 👋

Rongjie Huang (黄融杰) did my Graduate study at College of Computer Science and Software, Zhejiang University, supervised by Prof. Zhou Zhao. I also obtained Bachelor’s degree at Zhejiang University. During my graduate study, I was lucky to collaborate with the CMU Speech Team led by Prof. Shinji Watanabe, and Audio Research Team at Zhejiang University. I was grateful to intern or collaborate at TikTok, Shanghai AI Lab (OpenGV Lab), Tencent Seattle Lab, Alibaba Damo Academic, with Yi Ren, Jinglin Liu, Chunlei Zhang and Dong Yu.

My research interest includes Multi-Modal Generative AI, Multi-Modal Language Processing, and AI4Science. I have published first-author papers at the top international AI conferences such as NeurIPS/ICLR/ICML/ACL/IJCAI.

I am actively looking for academic collaboration, feel free to drop me an email.

📎 Homepages

💻 Selected Research Papers

Generative AI for Speech, Sing, and Audio: Spoken Large Language Model, Text-to-Audio Synthesis, Text-to-Speech Synthesis, Singing Voice Synthesis

Audio-Visual Language Processing: Audio-Visual Speech-to-Speech Translation, Self-Supervised Learning

My full paper list is shown at my personal homepage.

Spoken Large Language Model

Text-to-Speech Synthesis

Text-to-Audio Synthesis

Audio-Visual Language Processing

Singing Voice Synthesis

Pinned

  1. AIGC-Audio/AudioGPT AIGC-Audio/AudioGPT Public

    AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

    Python 9.8k 833

  2. Text-to-Audio/Make-An-Audio Text-to-Audio/Make-An-Audio Public

    PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model

    Python 695 102

  3. TranSpeech TranSpeech Public

    PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation

    Python 158 23

  4. GenerSpeech GenerSpeech Public

    PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.

    Python 305 44

  5. FastDiff FastDiff Public

    PyTorch Implementation of FastDiff (IJCAI'22)

    Python 390 63

  6. yangdongchao/AcademiCodec yangdongchao/AcademiCodec Public

    AcademiCodec: An Open Source Audio Codec Model for Academic Research

    Python 497 69