Hacker News with Generative AI: Audio Generation

AudioX: Diffusion Transformer for Anything-to-Audio Generation (zeyuet.github.io)
Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs.

Audio Generation, Generative AI, Machine Learning, Artificial Intelligence, Computer Science

148 points by gnabgib 74 days ago | 19 comments

Bring Silent Videos to Life Sounds(Open-Source) (github.com/open-mmlab)
FoleyCrafter is a video-to-audio generation framework which can produce realistic sound effects semantically relevant and synchronized with videos.

Open Source, Audio Generation, Video Processing, Machine Learning

14 points by BruceWok 120 days ago | 4 comments

Video-Guided Foley Sound Generation with Multimodal Controls (ificl.github.io)
Generating sound effects for videos often requires creating artistic sound effects that diverge significantly from real-life sources and flexible control in the sound design.

Video Editing, Artificial Intelligence, Sound Design, Audio Generation

46 points by surprisetalk 212 days ago | 19 comments

Nvidia claims a new AI audio generator can make sounds never heard before (theverge.com)
Nvidia says its new AI music editor can create “sounds never heard before” — like a trumpet that meows. The tool, called Fugatto, is capable of generating music, sounds, and speech using text and audio inputs it’s never been trained on.

Artificial Intelligence, Music, Technology, Audio Generation

11 points by jnord 214 days ago | 3 comments

Show HN: PlayNote – NotebookLM but with custom voices and API (play.ai)
Turn your files and data into captivating audio creations. Enjoy cutting-edge AI voice synthesis.

AI, Audio Generation, Voice Synthesis, Software, New Releases

18 points by amrrs 227 days ago | 4 comments

Pushing the frontiers of audio generation (deepmind.google)
Our pioneering speech generation technologies are helping people around the world interact with more natural, conversational and intuitive digital assistants and AI tools.

Audio Generation, Artificial Intelligence, Speech Technology

237 points by meetpateltech 240 days ago | 107 comments

Amphion: An open-source audio, music, and speech generation toolkit (github.com/open-mmlab)
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Audio Generation, Open Source, Music Generation, Speech Generation, Research

92 points by lapnect 244 days ago | 2 comments

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer (haidog-yaqub.github.io)
EzAudio is an advanced text-to-audio (T2A) generation model that creates high-quality audio from text prompts. It sets a new standard for open-source T2A models by delivering fast, efficient, and realistic sound effects generation.

Text-to-Audio, Audio Generation, Open Source, Artificial Intelligence, Machine Learning

99 points by blacktechnology 277 days ago | 17 comments

Generating audio for video: using video and text prompts to generate soundtracks (deepmind.google)

Generative AI, Audio Generation, Video, Machine Learning, Deep Learning

18 points by skoppula 374 days ago | 2 comments