Hacker News with Generative AI: Speech Generation

Qwen2.5-Omni Technical Report (huggingface.co)
In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.
Amphion: An open-source audio, music, and speech generation toolkit (github.com/open-mmlab)
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.