Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for Digital Avatars in Audio/Video and Text Communication #2573

Open
cubxxw opened this issue Mar 12, 2024 · 0 comments

Comments

@cubxxw
Copy link

cubxxw commented Mar 12, 2024

I have familiarized myself with the project at https://livekit.io/kitt, and I believe it offers a great opportunity to seamlessly integrate with the ChatGPT API. I am curious to know if LiveKit is preparing for the emergence of Sora.


I am exploring innovative ways to enhance user interaction within my application, particularly through the integration of digital avatars. These avatars could either be predefined virtual characters or dynamically generated based on user inputs, such as descriptions or images. The core idea is to facilitate audio/video and text-based communication between the user and these digital avatars, enriching the overall user experience.

Feature Description:

  • Digital Avatar Integration: Ability to integrate digital avatars that serve as the user's counterpart in communications. These avatars can be virtual characters (predefined) or created dynamically from user inputs (text descriptions, images, etc.).
  • Audio/Video Communication: Users should be able to engage in audio/video calls with these avatars, where the avatars can generate responses in real-time.
  • Text Communication: Alongside audio/video capabilities, the system should support text-based interactions between the user and the avatar.
  • Real-time Subtitles: For audio and video communications, real-time subtitles or captions that reflect the avatar's responses could greatly enhance accessibility and user understanding.

Implementation Considerations:

  • AI and Machine Learning: Utilizing AI to interpret user inputs for dynamic avatar creation and to drive the interaction model (speech recognition, text-to-speech, natural language processing).
  • LiveKit Integration: How can LiveKit support the backend infrastructure for such an interaction model? This includes considerations for low-latency audio/video streaming and data transmission for text chat.
  • Customizability and Scalability: Ensuring the system supports a wide range of avatar customizations and can scale to support a large number of concurrent interactions.

I am particularly interested in understanding whether the current capabilities of LiveKit can support such a feature or if there are planned updates that could facilitate this. Additionally, any guidance on how one might approach implementing this feature, considering the architectural and technological requirements, would be greatly appreciated.

Thank you for considering this request. I believe that the integration of digital avatars into communication platforms can significantly enhance user engagement and offer novel interaction experiences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant