Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf]
(arxiv.org)
Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.
Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.
ElevenReader
(elevenreader.io)
Bring any book, article, PDF, newsletter, or text to life with ultra realistic AI narration in one app
Bring any book, article, PDF, newsletter, or text to life with ultra realistic AI narration in one app
Zonos – Apache 2.0 licensed, Multilingual, Text to Speech model
(zyphra.com)
We are excited to announce the release of Zonos-v0.1 beta, featuring two expressive and real-time text-to-speech (TTS) models with high-fidelity voice cloning. We are releasing our 1.6B transformer and 1.6B hybrid under an Apache 2.0 license.
We are excited to announce the release of Zonos-v0.1 beta, featuring two expressive and real-time text-to-speech (TTS) models with high-fidelity voice cloning. We are releasing our 1.6B transformer and 1.6B hybrid under an Apache 2.0 license.
PlayAI's new Dialog model achieves 3:1 preference in human evals
(play.ht)
PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications requiring emotive, human-like speech. In recent third-party benchmark tests, Dialog was preferred by 10:1 vs. ElevenLabs v2.5 Turbo, and by over 3:1 vs. ElevenLabs Multilingual v2.0.Play the video below to find out what it sounds like, or visit our AI voiceover Studio to try it for yourself.
PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications requiring emotive, human-like speech. In recent third-party benchmark tests, Dialog was preferred by 10:1 vs. ElevenLabs v2.5 Turbo, and by over 3:1 vs. ElevenLabs Multilingual v2.0.Play the video below to find out what it sounds like, or visit our AI voiceover Studio to try it for yourself.
Show HN: Voice Cloning and Multilingual TTS in One Click (Windows)
(github.com/abus-aikorea)
Voice-Pro is a cutting-edge AI-powered web application designed to revolutionize multimedia content processing.
Voice-Pro is a cutting-edge AI-powered web application designed to revolutionize multimedia content processing.
Edge TTS
(github.com/rany2)
edge-tts is a Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
edge-tts is a Python module that allows you to use Microsoft Edge's online text-to-speech service from within your Python code or using the provided edge-tts or edge-playback command.
Generate audiobooks from E-books with Kokoro-82M
(claudio.uk)
Kokoro v0.19 is a recently published text-to-speech model with just 82M params and very high-quality output.
Kokoro v0.19 is a recently published text-to-speech model with just 82M params and very high-quality output.
MathReader: Text-to-Speech for Mathematical Documents [pdf]
(arxiv.org)
TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions.
TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions.
Show HN: New Cartesia Text-to-Speech Model
(cartesia.ai)
Real-time multimodal intelligence for every device
Real-time multimodal intelligence for every device
Play Dialog: A contextual turn-taking TTS model like NotebookLM Playground
(play.ai)
PlayNoteAgentsPlaygroundPricingAPICommunityConversation (2 Speakers)Narration (1 Speaker)LanguageSpeaker 1 VoiceSpeaker 2 VoiceConnecting...Random PromptCreate Voice Clone
PlayNoteAgentsPlaygroundPricingAPICommunityConversation (2 Speakers)Narration (1 Speaker)LanguageSpeaker 1 VoiceSpeaker 2 VoiceConnecting...Random PromptCreate Voice Clone
A CC-By Open-Source TTS Model with Voice Cloning
(huggingface.co)
OuteTTS-0.1-350M is a novel text-to-speech synthesis model that leverages pure language modeling without external adapters or complex architectures, built upon the LLaMa architecture using our Oute3-350M-DEV base model, it demonstrates that high-quality speech synthesis is achievable through a straightforward approach using crafted prompts and audio tokens.
OuteTTS-0.1-350M is a novel text-to-speech synthesis model that leverages pure language modeling without external adapters or complex architectures, built upon the LLaMa architecture using our Oute3-350M-DEV base model, it demonstrates that high-quality speech synthesis is achievable through a straightforward approach using crafted prompts and audio tokens.
Play 3.0 mini – A lightweight, reliable, cost-efficient Multilingual TTS model
(play.ht)
Today we’re releasing our most capable and conversational voice model that can speak in 30+ languages using any voice or accent, with industry leading speed and accuracy. We’re also releasing 50+ new conversational AI voices across languages.
Today we’re releasing our most capable and conversational voice model that can speak in 30+ languages using any voice or accent, with industry leading speed and accuracy. We’re also releasing 50+ new conversational AI voices across languages.