Hacker News with Generative AI: Multimodal

Meta Llama 3 vision multimodal models – how to use them and what they can do (theregister.com)
Meta has been influential in driving the development of open language models with its Llama family, but up until now, the only way to interact with them has been through text.
Molmo: a family of open multimodal AI models (allenai.org)
A Specialized UI Multimodal Model (motiff.com)
How it's Made: Interacting with Gemini through multimodal prompting (googleblog.com)