Hacker News with Generative AI: Multimodal Models

Mistral releases Pixtral 12B, its first multimodal model (techcrunch.com)
French AI startup Mistral has released its first model that can process images as well as text.
Transfusion: Predict the next token and diffuse images with one multimodal model (arxiv.org)
We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data.