Hacker News with Generative AI: Multimodal Models

Ollama's new engine for multimodal models (ollama.com)
Ollama now supports multimodal models via Ollama’s new engine, starting with new vision multimodal models:

Generative AI, Multimodal Models, Computer Vision

353 points by LorenDB 427 days ago | 84 comments

Mistral releases Pixtral 12B, its first multimodal model (techcrunch.com)
French AI startup Mistral has released its first model that can process images as well as text.

Generative AI, Artificial Intelligence, France, Multimodal Models, Language Models

163 points by jerbear4328 673 days ago | 40 comments

Transfusion: Predict the next token and diffuse images with one multimodal model (arxiv.org)
We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data.

Generative AI, Machine Learning, Multimodal Models, Computer Vision

122 points by fzliu 675 days ago | 10 comments