Hacker News with Generative AI: Text-to-Speech

Show HN: KVoiceWalk – Voice cloning for Kokoro TTS using random walk algorithms (github.com/RobViren)
KVoiceWalk tries to create new Kokoro voice style tensors that clones target voices by using a random walk algorithm and a hybrid scoring method that combines Resemblyzer similarity, feature extraction, and self similarity.

Voice Cloning, Machine Learning, Open Source, AI, Text-to-Speech

13 points by robviren 430 days ago | 0 comments

Finetune TTS Models Locally (reddit.com)

Finetuning, Text-to-Speech, Machine Learning

7 points by handfuloflight 432 days ago | 0 comments

Show HN: Dia, an open-weights TTS model for generating realistic dialogue (github.com/nari-labs)
Dia is a 1.6B parameter text to speech model created by Nari Labs.

Open Source, Text-to-Speech, AI, Language Models

652 points by toebee 460 days ago | 191 comments

Coqui TTS: Free Text-to-Speech (coquitts.com)

Text-to-Speech, Open Source, AI, Software

19 points by kangfeibo 477 days ago | 13 comments

Orpheus-3B – Emotive TTS by Canopy Labs (canopylabs.ai)

Artificial Intelligence, Text-to-Speech, Software, Speech Synthesis

186 points by Zetaphor 493 days ago | 39 comments

Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf] (arxiv.org)
Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.

Generative AI, Text-to-Speech, Deep Learning, Artificial Intelligence, Computer Science

78 points by bilekas 506 days ago | 6 comments

ElevenReader (elevenreader.io)
Bring any book, article, PDF, newsletter, or text to life with ultra realistic AI narration in one app

AI, Text-to-Speech, Accessibility, Reading Tools, Software

305 points by mfiguiere 529 days ago | 179 comments

Zonos – Apache 2.0 licensed, Multilingual, Text to Speech model (zyphra.com)
We are excited to announce the release of Zonos-v0.1 beta, featuring two expressive and real-time text-to-speech (TTS) models with high-fidelity voice cloning. We are releasing our 1.6B transformer and 1.6B hybrid under an Apache 2.0 license.

Text-to-Speech, Artificial Intelligence, Open Source, New Releases

21 points by sibellavia 530 days ago | 3 comments

PlayAI's new Dialog model achieves 3:1 preference in human evals (play.ht)
PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications requiring emotive, human-like speech. In recent third-party benchmark tests, Dialog was preferred by 10:1 vs. ElevenLabs v2.5 Turbo, and by over 3:1 vs. ElevenLabs Multilingual v2.0.Play the video below to find out what it sounds like, or visit our AI voiceover Studio to try it for yourself.

Artificial Intelligence, Text-to-Speech, Language Models

95 points by legofan94 537 days ago | 55 comments

Show HN: Voice Cloning and Multilingual TTS in One Click (Windows) (github.com/abus-aikorea)
Voice-Pro is a cutting-edge AI-powered web application designed to revolutionize multimedia content processing.

AI, Voice Cloning, Text-to-Speech, Software, Windows

9 points by vulcanidic 545 days ago | 3 comments

Edge TTS (github.com/rany2)
edge-tts is a Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.

Python, Text-to-Speech, Microsoft Edge, Open Source, Libraries

107 points by smy20011 549 days ago | 65 comments

Generate audiobooks from E-books with Kokoro-82M (claudio.uk)
Kokoro v0.19 is a recently published text-to-speech model with just 82M params and very high-quality output.

Text-to-Speech, Generative AI, Software, Audiobooks

420 points by csantini 556 days ago | 246 comments

MathReader: Text-to-Speech for Mathematical Documents [pdf] (arxiv.org)
TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions.

Text-to-Speech, Mathematical Documents, Accessibility, AI, Research

19 points by bikenaga 557 days ago | 0 comments

Show HN: New Cartesia Text-to-Speech Model (cartesia.ai)
Real-time multimodal intelligence for every device

Text-to-Speech, AI, Show HN, Software, Cartesia

10 points by cartesia 590 days ago | 0 comments

Play Dialog: A contextual turn-taking TTS model like NotebookLM Playground (play.ai)
PlayNoteAgentsPlaygroundPricingAPICommunityConversation (2 Speakers)Narration (1 Speaker)LanguageSpeaker 1 VoiceSpeaker 2 VoiceConnecting...Random PromptCreate Voice Clone

Speech Recognition, Artificial Intelligence, Text-to-Speech, Machine Learning

49 points by dulldata 619 days ago | 13 comments

A CC-By Open-Source TTS Model with Voice Cloning (huggingface.co)
OuteTTS-0.1-350M is a novel text-to-speech synthesis model that leverages pure language modeling without external adapters or complex architectures, built upon the LLaMa architecture using our Oute3-350M-DEV base model, it demonstrates that high-quality speech synthesis is achievable through a straightforward approach using crafted prompts and audio tokens.

Open Source, Text-to-Speech, Voice Cloning, AI, Language Modeling

131 points by amrrs 628 days ago | 31 comments

Show HN: I made a tutorial of how to use free edge TTS API with deno.js [video] (youtube.com)

Tutorials, Programming, Text-to-Speech, Deno.js, JavaScript

5 points by neochau 630 days ago | 0 comments

Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model (play.ht)
Today we’re releasing our most capable and conversational voice model that can speak in 30+ languages using any voice or accent, with industry leading speed and accuracy. We’re also releasing 50+ new conversational AI voices across languages.

Multilingual, Artificial Intelligence, Text-to-Speech, Software

258 points by amrrs 649 days ago | 83 comments

Free Text-to-Speech App with natural voices (elevenlabs.io)

Text-to-Speech, Artificial Intelligence, Software, Accessibility

38 points by jslakro 702 days ago | 29 comments

Show HN: Using AI to generate custom sounds from text (image-effects.com)

Generative AI, Artificial Intelligence, Sound Design, Text-to-Speech

19 points by Mabroorahmed 723 days ago | 8 comments

Show HN: I generated 70k audiobooks with OpenAI Text-to-Speech (listenly.io)

OpenAI, Text-to-Speech, Audiobooks, Generative AI, Show HN

140 points by evan_ry 741 days ago | 109 comments

Coqui.ai TTS: A Deep Learning Toolkit for Text-to-Speech (github.com/coqui-ai)