Hacker News with Generative AI: Text Embeddings

Text Embeddings are All Alike (arxiv.org)
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches.

Text Embeddings, Machine Learning, Artificial Intelligence

5 points by jxmorris12 187 days ago | 0 comments

Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust (github.com/MinishLab)
This crate provides a lightweight Rust implementation for loading and inference of Model2Vec static embedding models. For distillation and training, the Python Model2Vec package can be used.

Rust, Machine Learning, NLP, Text Embeddings

60 points by Tananon 189 days ago | 15 comments

Chrome's New Embedding Model: Smaller, Faster, Same Quality (dejan.ai)
Chrome’s latest update incorporates a new text embedding model that is 57% smaller (35.14MB vs 81.91MB) than its predecessor while maintaining virtually identical performance in semantic search tasks.

Chrome, Machine Learning, Search, Software Updates, Text Embeddings

40 points by kaycebasques 194 days ago | 11 comments

The best way to use text embeddings portably is with Parquet and Polars (minimaxir.com)
Text embeddings, particularly modern embeddings generated from large language models, are one of the most useful applications coming from the generative AI boom.

Text Embeddings, Generative AI, Data Science, Tools

247 points by minimaxir 272 days ago | 59 comments

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models (arxiv.org)
Many use cases require retrieving smaller portions of text, and dense vector-based retrieval systems often perform better with shorter text segments, as the semantics are less likely to be "over-compressed" in the embeddings.

Information Retrieval, Text Embeddings, Machine Learning

20 points by mfiguiere 429 days ago | 3 comments