Hacker News with Generative AI: Multimodal

Bagel: Open-source unified multimodal model (bagel-ai.org)

Open Source, Artificial Intelligence, Multimodal

221 points by tosh 33 days ago | 33 comments

Qwen2.5-Omni Technical Report (huggingface.co)
In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

Generative AI, Multimodal, Text Generation, Speech Generation

6 points by wertyk 89 days ago | 0 comments

Alibaba Qwen2.5-Omni-7B: Open Source End-to-End Multimodal AI Model (alizila.com)
Alibaba Cloud has launched Qwen2.5-Omni-7B, a unified end-to-end multimodal model in the Qwen series.

Open Source, AI, Multimodal

29 points by giuliomagnifico 92 days ago | 3 comments

Meta Llama 3 vision multimodal models – how to use them and what they can do (theregister.com)
Meta has been influential in driving the development of open language models with its Llama family, but up until now, the only way to interact with them has been through text.

Meta, Generative AI, Vision, Multimodal

10 points by rntn 264 days ago | 1 comments

Molmo: a family of open multimodal AI models (allenai.org)

Open Source, Artificial Intelligence, Multimodal

55 points by jasondavies 275 days ago | 10 comments

A Specialized UI Multimodal Model (motiff.com)

User Interface, Multimodal, Models

7 points by Julie309 299 days ago | 1 comments

How it's Made: Interacting with Gemini through multimodal prompting (googleblog.com)

Generative AI, Multimodal, Google

3 points by Bluestein 349 days ago | 0 comments