All-in-one embedding model for interleaved text, images, and screenshots
(voyageai.com)
TL;DR — We are excited to announce voyage-multimodal-3, a new state-of-the-art for multimodal embeddings and a big step forward towards seamless RAG and semantic search for documents rich with both visuals and text. Unlike existing multimodal embedding models, voyage-multimodal-3 is capable of vectorizing interleaved texts + images and capturing key visual features from screenshots of PDFs, slides, tables, figures, and more, thereby eliminating the need for complex document parsing.
TL;DR — We are excited to announce voyage-multimodal-3, a new state-of-the-art for multimodal embeddings and a big step forward towards seamless RAG and semantic search for documents rich with both visuals and text. Unlike existing multimodal embedding models, voyage-multimodal-3 is capable of vectorizing interleaved texts + images and capturing key visual features from screenshots of PDFs, slides, tables, figures, and more, thereby eliminating the need for complex document parsing.