Hacker News with Generative AI: Audio Processing

FlowTSE: Target Speaker Extraction with Flow Matching (arxiv.org)
Target speaker extraction (TSE) aims to isolate a specific speaker's speech from a mixture using speaker enrollment as a reference.

Speech Recognition, Audio Processing, Machine Learning, Artificial Intelligence

25 points by agold97 55 days ago | 2 comments

LLMs can see and hear without any training (github.com/facebookresearch)
LLMs can see and hear without any training

Computer Vision, Audio Processing, Generative AI

210 points by T-A 87 days ago | 66 comments

Raspberry Pi cluster spotted inside $6k audio processor (jeffgeerling.com)
People often ask me whether Pi clusters are useful besides just tinkering. I've built my fair share, including my most recent 'Lamp Rack' Kubernetes-in-a-Lamp cluster.

Raspberry Pi, Hardware, Audio Processing, Kubernetes

16 points by voxadam 102 days ago | 5 comments

Show HN: Open-source, native audio turn detection model (github.com/pipecat-ai)
This is an open source, community-driven, native audio turn detection model.

Open Source, Audio Processing, Machine Learning

126 points by kwindla 138 days ago | 28 comments

Send Data with Sound (github.com/solst-ice)
This application allows you to transmit and receive data through sound. It uses a simple encoding scheme to convert text into audio frequencies, which can be played through your speakers and picked up by a microphone.

Audio Processing, Data Transmission, Open Source, Software

32 points by amrrs 141 days ago | 28 comments

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations (grisoon.github.io)
We present INFP, an audio-driven interactive head generation framework for dyadic conversations. Given the dual-track audio in dyadic conversations and a single portrait image of arbitrary agent, our framework can dynamically synthesize verbal, non-verbal and interactive agent videos with lifelike facial expressions and rhythmic head pose movements. Additionally, our framework is lightweight yet powerful, making it practical in instant communication scenarios such as the video conferencing. INFP denotes our method is Interactive, Natural, Flash and Person-generic.

Generative AI, Computer Vision, Video Conferencing, Audio Processing

29 points by nnx 213 days ago | 21 comments

Fish Speech 1.5 (github.com/fishaudio)
This codebase and all models are released under CC-BY-NC-SA-4.0 License. Please refer to LICENSE for more details.

New Releases, Software, Audio Processing

25 points by artninja1988 230 days ago | 2 comments

Nvidia Fugatto: "World's Most Flexible Sound Machine" (nvidia.com)
A team of generative AI researchers created a Swiss Army knife for sound, one that allows users to control the audio output simply using text.

Generative AI, Audio Processing, Artificial Intelligence, Nvidia

74 points by microsoftedging 238 days ago | 47 comments

Show HN: Open-Source Tool to Remove Background Music from Videos (github.com/omeryusufyagci)
Fast Music Remover is a lightweight tool designed to remove music, sound effects and noise from internet media. Processing takes about 8% of the original source length -that's under 5 seconds for a minute-long video!

Open Source, Video Editing, Audio Processing, Tools

20 points by oyyagci 241 days ago | 4 comments

Audio Decomposition – open-source seperation of music to constituent instruments (matthew-bird.com)
My plan for this project was to create a program to turn music to sheet music. It was mainly incentivised by my own desire to turn music to sheet music and the lack (from what I could tell) of open source, simple algorithms to perform audio source separation.

Music, Audio Processing, Open Source, Machine Learning

314 points by thunderbong 254 days ago | 64 comments

Hertz-dev, the first open-source base model for conversational audio (si.inc)
For the last few months, we at Standard Intelligence have focused on fundamental research on the frontier of audio-only speech generation. We're excited to announce that we're open-sourcing current checkpoints of our full-duplex, audio-only transformer base model, hertz-dev, with a total of 8.5 billion parameters.

Open Source, Audio Processing, Generative AI, Machine Learning, Artificial Intelligence

296 points by mnk47 261 days ago | 56 comments

A Golang pipeline abomination (poxate.com)
In this project, we need to overlay a looping short music track over a long voice soundtrack.

Golang, Audio Processing, Programming, Music

21 points by kermerlerper 262 days ago | 8 comments

NotebookLlama: An open source version of NotebookLM (github.com/meta-llama)
This is a guided series of tutorials/notebooks that can be taken as a reference or course to build a PDF to Podcast workflow.

Open Source, Machine Learning, Tutorials, Audio Processing

322 points by bibinmohan 268 days ago | 72 comments

Debugging audio artifacts caused by... a serial port? (recall.ai)
At Recall.ai we run enormous infrastructure to process millions of meetings per month, in real-time.

Audio Processing, Debugging, Software, Infrastructure

56 points by davidgu 272 days ago | 29 comments

Omnio: First AI model that can natively reason over audio (soniox.com)
Omnio is the first multimodal AI model to comprehensively understand both conversations and human behavior through audio.

Artificial Intelligence, Audio Processing

13 points by lukax 280 days ago | 8 comments

Show HN: Detect if an audio file was generated by NotebookLM (github.com/ListenNotes)
A simple tool to detect whether an audio file was generated by NotebookLM.

Generative AI, Audio Processing, AI Detection

97 points by wenbin 291 days ago | 40 comments

Show HN: Reverb ASR+Diarization, the Best Open Source ASR for Long-Form Audio (ycombinator.com)
Today, we are launching and open sourcing our current generation ASR models named "Reverb."

Open Source, Speech Recognition, Audio Processing

12 points by leetharris 292 days ago | 1 comments

Lessons learnt building a real-time audio application in Python (vangemert.dev)

Python, Real-Time Applications, Audio Processing

60 points by spmvg 316 days ago | 32 comments

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency (loopyavatar.github.io)