Hacker News with Generative AI: Database

Show HN: AGX – Open-Source Data Exploration for ClickHouse (The New Standard?) (github.com/agnosticeng)
agx is a desktop application built with Tauri and SvelteKit that provides a modern interface for exploring and querying data using ClickHouse's embedded database engine (chdb).
Preview: Amazon S3 Tables and Lakehouse in DuckDB (duckdb.org)
TL;DR: We are happy to announce a new preview feature that adds support for Apache Iceberg REST Catalogs, enabling DuckDB users to connect to Amazon S3 Tables and Amazon SageMaker Lakehouse with ease.
Ask HN: Alternatives to Vector DB? (ycombinator.com)
A while back I was looking for a vector database that would work across Windows / Mac / Linux platforms. Some of the options required specific processors like Intel. I am curious if there are any alternatives to a Vector DB that can run cross platform and are easy to setup?
Selective async commits in PostgreSQL – balancing durability and performance (shayon.dev)
I was recently looking into some workloads that generate a lot of I/O and CPU contention on some very high-write code paths and came across synchronous_commit (https://www.postgresql.org/docs/current/wal-async-commit.html). It can be very tempting to turn this off globally because the performance gains in terms of I/O, CPU, and TPS (transactions per second) are very hard to overlook. I noticed I/O completely gone, CPU down 20% (at peak), and a 30% increase in TPS.
Nubmq: A high performant key-value store engine built from first principles (ycombinator.com)
Nubmq: A high performant key-value store engine built from first principles
What if we just didn't decompress it? (spiraldb.com)
Vortex is unique in the way it evaluates filter and projection expressions by supporting full compute push-down, in many cases avoiding decompression entirely.
An Experimental Study of Bitmap Compression vs. Inverted List Compression (dl.acm.org)
Bitmap compression has been studied extensively in the database area and many efficient compression schemes were proposed, e.g., BBC, WAH, EWAH, and Roaring. Inverted list compression is also a well-studied topic in the information retrieval community and many inverted list compression algorithms were developed as well, e.g., VB, PforDelta, GroupVB, Simple8b, and SIMDPforDelta.
EdgeDB is now Gel and Postgres is the future (geldata.com)
We have news! EdgeDB is rebranding as Gel, more on that below.
Do It Yourself Database CDN with Embedded Replicas (turso.tech)
Imagine you have a user in Singapore, and your database is in the US. Every time the user makes a request, it has to travel halfway around the world, which can lead to high latency and poor performance.
Show HN: OLake[open source] Fastest database to Iceberg data replication tool (ycombinator.com)
Hi HN,<p>Today we’re excited to introduce OLake (github.com/datazip-inc/olake, 130+ and growing fast), an open-source tool built to help you replicate Database (MongoDB, for now, mysql and postgres under development) data into Data Lakehouse at faster speed without any hassle of managing Debezium or kafka (at least 10x faster than Airbyte and Fivetran at fraction of the cost, refer docs for benchmarks - https://olake.io/docs/connectors/mongodb/benchmarks).
SQL pipe syntax available in public preview in BigQuery (cloud.google.com)
Pipe query syntax is an extension to GoogleSQL that supports a linear query structure designed to make your queries easier to read, write, and maintain. You can use pipe syntax anywhere you write GoogleSQL.
A New Postgres Block Storage Layout for Full Text Search (paradedb.com)
One Billion JSON Challenge (clickhouse.com)
We took on Gunnar Morling’s One Billion Row Challenge almost exactly a year ago, testing how quickly a 1-billion-row text file could be aggregated.
Adding concurrent read/write to DuckDB with Arrow Flight (definite.app)
We've been thinking a lot about latency, streaming and (near) real-time analytics lately. At Definite, we deal with a lot of data pipelines. In most cases (e.g. ingesting Stripe data), our customers are fine with batch processing (e.g. every hour). But as we've grown, we've seen more and more need for near real-time pipelines (e.g. ingesting events or CDC from Postgres).
Composable SQL (borretti.me)
SQL could be improved somewhat by introducing composable query fragments with statically-typed interfaces. I begin by explaining two areas (testing and reusing business logic) where SQL does very poorly. Then I explain my solution, and how it addresses the problems.
An experiment of adding recommendation engine to your app using pgvector search (silk.us)
The reason I am excited about the latest genAI and vector search developments is that you can “plug in” these new technologies into your existing applications and their data, without having to re-platform or re-engineer your entire stack first. You can keep your existing application code and database design exactly as it is and then just add new genAI & vector search features where it makes sense.
The Mythical IO-Bound Rails App (byroot.github.io)
When the topic of Rails performance comes up, it is commonplace to hear that the database is the bottleneck, so Rails applications are IO-bound anyway, hence Ruby performance doesn’t matter that much, and all you need is a healthy dose of concurrency to make your service scale.
The missing tier for query compilers (scattered-thoughts.net)
Database query engines used to be able to assume that disk latency was so high that the overhead of interpreting the query plan didn't matter. Unfortunately these days a cheap nvme ssd can supply data much faster than a query interpreter can process it.
Database Release and End-to-End Testing: ClickHouse Database Cloning (notion.site)
How bloom filters made SQLite 10x faster (avi.im)
This is the fascinating story of how researchers used Bloom filters cleverly to make SQLite 10x faster for analytical queries.
Reads Causing Writes in Postgres (jesipow.com)
It is good practice to regularly inspect the statements running in the hot path of your Postgres instance. One way to do this is to examine the pg_stat_statements view, which shows various statistics about the SQL statements executed by the Postgres server.
ScyllaDB – Why We're Moving to a Source Available License (scylladb.com)
ScyllaDB has decided to focus on a single release stream – ScyllaDB Enterprise. Starting with the ScyllaDB Enterprise 2025.1 release (ETA February 2025):
VectorChord: Store 400k Vectors for $1 in PostgreSQL (pgvecto.rs)
We’re pleased to announce our new vector search extension for PostgreSQL, providing a highly cost-effective way to manage large vectors. Using VectorChord, you can achieve a QPS of 131 with 0.95 precision on 100 million 768-dimensional vectors for the top 10 queries. This setup costs only $250 monthly and can be hosted on a single machine.
Supabase AI Assistant v2 (supabase.com)
Today we are releasing Supabase Assistant v2 in the Dashboard - a global assistant with several new abilities:
Introducing integrated inference: Embed, rerank, and retrieve data with one API (pinecone.io)
We’re excited to announce expanded inference capabilities alongside our core vector database to make it even easier and faster to build high-quality, knowledgeable AI applications with Pinecone.
Show HN: WeSQL – An Innovative MySQL That Stores All Data on S3 (github.com/wesql)
WeSQL is an innovative MySQL distribution that adopts a compute-storage separation architecture, with storage backed by S3 (and S3-compatible systems). It can run on any cloud, ensuring no vendor lock-in.
Sqlpkg – The SQLite Extension Hub (sqlpkg.org)
Find SQLite extensions using the search box above. You can download and install them manually, or use the sqlpkg package manager.
Pg_karnak: Transactional schema migration across tenant databases (thenile.dev)
When we need to describe Nile in a single sentence, we say "PostgreSQL re-engineered for multi-tenant apps".
FQL: A KV Query Language (github.com/janderland)
FQL provides a query language and an alternative client API for Foundation DB.
Steps in Writing Analytical SQL Queries (crunchydata.com)
It is never immediately obvious how to go from a simple SQL query to a complex one -- especially if it involves intricate calculations.