Hacker News with Generative AI: Vector Databases

Show HN: VectorVFS, your filesystem as a vector database (readthedocs.io)
VectorVFS is a lightweight Python package that transforms your Linux filesystem into a vector database by leveraging the native VFS (Virtual File System) extended attributes.

Python, File Systems, Vector Databases, Open Source, Software

279 points by perone 77 days ago | 138 comments

Product Quantization: Compressing high-dimensional vectors by 97% (pinecone.io)

Machine Learning, Data Compression, Vector Databases

10 points by jxmorris12 89 days ago | 0 comments

Sharding Pgvector (pgdog.dev)
If you find yourself working with embeddings, you’ve shopped around for a vector database. pgvector is a great option if you’re using Postgres already. Once you reach a certain scale (about a million arrays), building indices starts taking a long time. Some workarounds, like parallel workers, help, but you still need to fit the whole graph in memory.

Databases, PostgreSQL, Embeddings, Vector Databases

86 points by levkk 117 days ago | 10 comments

Ask HN: Alternatives to Vector DB? (ycombinator.com)
A while back I was looking for a vector database that would work across Windows / Mac / Linux platforms. Some of the options required specific processors like Intel. I am curious if there are any alternatives to a Vector DB that can run cross platform and are easy to setup?

Database, Cross-Platform, Alternatives, Open Source, Vector Databases

21 points by tmaly 125 days ago | 37 comments

Why Vector DBs Are the Wrong Abstraction – and What We Built Instead (topk.io)
We’ve spent the last three years building the most popular vector database on the market. In that time we realized that a database built around vectors as a primary key is simply the wrong abstraction, creating an unnecessary obstacle for users in production.

Vector Databases, Database Design, Artificial Intelligence

23 points by Void_ 132 days ago | 8 comments

Show HN: Tinyhnsw – The Littlest Vector Database (github.com/jbarrow)
TinyHNSW is a tiny, simple vector database. It weighs in at a measly few hundred lines of code. It's built on a straightforward (but not fast) implementation of HNSW in Python with minimal dependencies. It has an associated set of tutorials that build up to understanding how HNSW works, and how you can build your own TinyHNSW.

Python, Vector Databases, Machine Learning, Open Source, Tutorials

17 points by jbarrow 195 days ago | 0 comments

Powering AI RAG Applications with Vector Embeddings (sambanova.ai)
As AI developers strive to build faster, more accurate and contextually relevant Retrieval Augmented Generation (RAG) systems, they face significant challenges in efficiently managing large-scale unstructured data and delivering fast, accurate responses.

Artificial Intelligence, Vector Databases

10 points by fzliu 201 days ago | 0 comments

Why HNSW is not the answer and disk-based alternatives might be more practical (pgvecto.rs)
HNSW (Hierarchical Navigable Small World) has become the go-to algorithm for many vector databases. Its multi-layered graph structure and ability to efficiently navigate vector embeddings make it particularly appealing. However, despite its apparent advantages, HNSW may not be the optimal solution for large-scale and dynamic vector similarity search. In this blog post, we challenge the dominance of HNSW and explore why disk-based alternatives, such as IVF (Inverted File Index), might be more practical for massive datasets.

Vector Databases, HNSW, IVF, Database Performance, Data Structures

138 points by kevlened 210 days ago | 64 comments

Pinecone integrates AI inferencing with vector database (blocksandfiles.com)
GenAI inferencing can now be run directly from the Pinecone vector database to improve retrieval-augmented generation (RAG).

Artificial Intelligence, Databases, Vector Databases, GenAI

24 points by jimminyx 229 days ago | 18 comments

Introducing integrated inference: Embed, rerank, and retrieve data with one API (pinecone.io)
We’re excited to announce expanded inference capabilities alongside our core vector database to make it even easier and faster to build high-quality, knowledgeable AI applications with Pinecone.

Artificial Intelligence, Database, APIs, Vector Databases

1 points by jimminyx 230 days ago | 0 comments

Elasticsearch Was Great, but Vector Databases Are the Future (thenewstack.io)

Database Technology, Vector Databases, Elasticsearch

16 points by DISCURSIVE 244 days ago | 7 comments

Scaling Document Data Extraction with LLMs and Vector Databases (timescale.com)
Extracting structured data from unstructured documents is a powerful use case for large language models (LLMs). This sort of data extraction from complex documents has always remained a challenge. Done either completely manually or using current intelligent document processing (IDP) platforms that utilize previous-generation machine learning or natural language processing (NLP) techniques is very time-consuming and tedious.

Data Extraction, Vector Databases

12 points by avthar 248 days ago | 2 comments

Vector databases are the wrong abstraction (timescale.com)
"Your embeddings are out of sync again."

Databases, Vector Databases, Artificial Intelligence

493 points by jascha_eng 265 days ago | 90 comments

The PlanetScale vectors public beta (planetscale.com)
We're excited to announce that PlanetScale vector search and storage is now available in open beta! With PlanetScale vector support, you can store your vector data alongside your application's relational MySQL data — eliminating the need for a separate specialized vector database.

Databases, Open Source, Beta Releases, Vector Databases, Cloud Computing

133 points by ksec 272 days ago | 35 comments

A Vector Database Plays Mario Kart 64 (medium.com)
In this article, I’ll introduce you to an original application of image search. I’ve named it Qdrant Kart, and, as you might guess, it involves using a Vector Database (Qdrant) to play Mario Kart 64 — one of my all-time favorite games.

Gaming, Vector Databases, Artificial Intelligence, Image Search

11 points by mtrofficus 292 days ago | 7 comments

BBQvec: An open-source, embedded vector index for Rust and Go (daxe.ai)
At Daxe, we’re building Structured Semantic Search – a complete AI search stack. Our team leverages our collective experience from OpenAI, Google, Lyft, AWS, Harvard, Berkeley, and Darden to create novel technologies for developers and organizations to harness the full potential of their data.

Open Source, Programming Languages, Search Engines, Vector Databases

12 points by thunderbong 298 days ago | 1 comments

Using the Pinecone vector database in .NET (infoworld.com)
If you’re building generative AI applications, you need to control the data used to generate answers to user queries.

.NET, Generative AI, Vector Databases

1 points by benocodes 306 days ago | 1 comments

PGVector's Missing Features (trieve.ai)
PGVector offers infrastructure simplicity at the cost of missing some key features desireable in search solutions. We explain what those are in this blog.

Open Source, Search, Vector Databases

51 points by skeptrune 311 days ago | 11 comments

Create a RAG Pipeline with Pinecone (vectorize.io)
This quickstart will walk you through creating and scheduling a pipeline that collects data from an Amazon S3 bucket, creates vector embeddings using an OpenAI embedding model, and writes the vectors to your Pinecone search index.

RAG, Vector Databases, Search, OpenAI, Cloud Computing

13 points by bytearray 312 days ago | 0 comments

Show HN: No-Code ETL Framework for Vector Databases (github.com/ContextData)

No-Code, ETL, Vector Databases

11 points by jide_tracc 332 days ago | 0 comments

MariaDB Introduces Open-Source Vector Preview (infoq.com)

MariaDB, Open Source, Databases, Vector Databases

30 points by gsky 337 days ago | 6 comments

SQLite-vec v0.1.0: a vector search SQLite extension that runs everywhere (alexgarcia.xyz)

Databases, SQLite, Search, Vector Databases

40 points by Tiberium 354 days ago | 2 comments

Vector DB Comparison List (superlinked.com)