Hacker News with Generative AI: Code Retrieval

How do we evaluate vector-based code retrieval? (voyageai.com)
Despite the widespread use of vector-based code retrieval, evaluating the retrieval quality of embedding models for code retrieval is a common pain point.
Voyage-code-3 (voyageai.com)
TL;DR – Introducing voyage-code-3, our next-generation embedding model optimized for code retrieval. It outperforms OpenAI-v3-large and CodeSage-large by an average of 13.80% and 16.81% on a suite of 32 code retrieval datasets, respectively. By supporting smaller dimensions with Matryoshka learning and quantized formats like int8 and binary, voyage-code-3 can also dramatically reduce storage and search costs with minimal impact on retrieval quality.