Hacker News with Generative AI: Database

A PostgreSQL planner semi-join gotcha with CTE, LIMIT, and RETURNING (shayon.dev)
I recently discovered an unexpected behavior in PostgreSQL involving a pattern of using a Common Table Expression (CTE) with DELETE ... RETURNING and LIMIT to process a batch of items from a queue-like table. What seemed straightforward turned out to have a surprising interaction with the query planner.
RDS Reserved Instances: Where Did All the New Instance Types Go? (duckbillgroup.com)
AWS is excluding newer RDS instance types from Reserved Instance purchases. Is this oversight or the quiet continuation of their RI deprecation strategy?
Dragonfly Is Not Redis: An Open Letter to the Community (dragonflydb.io)
Modern architecture, better performance, lower cost, and built for today’s needs and future scale. Dragonfly is not Redis.
Beyond Performance: Measuring the environmental impact of analytical databases (arxiv.org)
The exponential growth of data is making query processing increasingly critical for modern computing infrastructure, yet the environmental impact of database operations remains poorly understood and largely overlooked.
Jepsen: Amazon RDS for PostgreSQL 17.4 (jepsen.io)
Amazon RDS for PostgreSQL is an Amazon Web Services (AWS) service which provides managed instances of the PostgreSQL database. We show that Amazon RDS for PostgreSQL multi-AZ clusters violate Snapshot Isolation, the strongest consistency model supported across all endpoints. Healthy clusters occasionally allow Long Fork and other G-nonadjacent cycles. These phenomena occurred in every version tested, from 13.15 to 17.4. Amazon RDS for PostgreSQL may instead provide Parallel Snapshot Isolation.
Beyond Elk: Lightweight and Scalable Cloud-Native Log Monitoring (greptime.com)
This article explores the growing limitations of the ELK stack in modern log storage scenarios and introduces GreptimeDB as a next-generation log database with advantages in both architecture and user experience.
Wikipedia: Database Download (wikipedia.org)
Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is licensed under the Creative Commons Attribution-ShareAlike 4.0 License (CC-BY-SA), and most is additionally licensed under the GNU Free Documentation License (GFDL). [1] Images and other files are available under different terms, as detailed on their description pages.
Abusing DuckDB-WASM by making SQL draw 3D graphics (Sort Of) (hey.earth)
I had this slightly crazy idea: Could I ditch most of the conventional JavaScript game loop and rendering logic and build a 3D game engine where SQL queries did the heavy lifting?
Everything You Need to Know About Incremental View Maintenance (materializedview.io)
Incremental view maintenance has been a hot topic lately.
New Apache Cassandra Release Saves 400% IOPS (simplyblock.io)
On April 10, 2025, the Apache Software Foundation released version 5.0.4 of Apache Cassandra, bringing significant performance optimizations for all users—but especially for those relying on remotely attached storage like Amazon EBS. The standout feature in this release is an overhaul of the compaction algorithm aimed at slashing IOPS usage while increasing overall throughput.
Faster interpreters in Go: Catching up with C++ (planetscale.com)
The SQL evaluation engine that ships with Vitess, the open-source database that powers PlanetScale, was originally implemented as an AST evaluator that used to operate directly on the SQL AST generated by our parser. Over this past year, we've gradually replaced it with a Virtual Machine which, despite being written natively in Go, performs similarly to the original C++ evaluation code in MySQL.
Show HN: AGX – Open-Source Data Exploration for ClickHouse (The New Standard?) (github.com/agnosticeng)
agx is a desktop application built with Tauri and SvelteKit that provides a modern interface for exploring and querying data using ClickHouse's embedded database engine (chdb).
Preview: Amazon S3 Tables and Lakehouse in DuckDB (duckdb.org)
TL;DR: We are happy to announce a new preview feature that adds support for Apache Iceberg REST Catalogs, enabling DuckDB users to connect to Amazon S3 Tables and Amazon SageMaker Lakehouse with ease.
Ask HN: Alternatives to Vector DB? (ycombinator.com)
A while back I was looking for a vector database that would work across Windows / Mac / Linux platforms. Some of the options required specific processors like Intel. I am curious if there are any alternatives to a Vector DB that can run cross platform and are easy to setup?
Selective async commits in PostgreSQL – balancing durability and performance (shayon.dev)
I was recently looking into some workloads that generate a lot of I/O and CPU contention on some very high-write code paths and came across synchronous_commit (https://www.postgresql.org/docs/current/wal-async-commit.html). It can be very tempting to turn this off globally because the performance gains in terms of I/O, CPU, and TPS (transactions per second) are very hard to overlook. I noticed I/O completely gone, CPU down 20% (at peak), and a 30% increase in TPS.
Nubmq: A high performant key-value store engine built from first principles (ycombinator.com)
Nubmq: A high performant key-value store engine built from first principles
What if we just didn't decompress it? (spiraldb.com)
Vortex is unique in the way it evaluates filter and projection expressions by supporting full compute push-down, in many cases avoiding decompression entirely.
An Experimental Study of Bitmap Compression vs. Inverted List Compression (dl.acm.org)
Bitmap compression has been studied extensively in the database area and many efficient compression schemes were proposed, e.g., BBC, WAH, EWAH, and Roaring. Inverted list compression is also a well-studied topic in the information retrieval community and many inverted list compression algorithms were developed as well, e.g., VB, PforDelta, GroupVB, Simple8b, and SIMDPforDelta.
EdgeDB is now Gel and Postgres is the future (geldata.com)
We have news! EdgeDB is rebranding as Gel, more on that below.
Do It Yourself Database CDN with Embedded Replicas (turso.tech)
Imagine you have a user in Singapore, and your database is in the US. Every time the user makes a request, it has to travel halfway around the world, which can lead to high latency and poor performance.
Show HN: OLake[open source] Fastest database to Iceberg data replication tool (ycombinator.com)
Hi HN,<p>Today we’re excited to introduce OLake (github.com/datazip-inc/olake, 130+ and growing fast), an open-source tool built to help you replicate Database (MongoDB, for now, mysql and postgres under development) data into Data Lakehouse at faster speed without any hassle of managing Debezium or kafka (at least 10x faster than Airbyte and Fivetran at fraction of the cost, refer docs for benchmarks - https://olake.io/docs/connectors/mongodb/benchmarks).
SQL pipe syntax available in public preview in BigQuery (cloud.google.com)
Pipe query syntax is an extension to GoogleSQL that supports a linear query structure designed to make your queries easier to read, write, and maintain. You can use pipe syntax anywhere you write GoogleSQL.
A New Postgres Block Storage Layout for Full Text Search (paradedb.com)
One Billion JSON Challenge (clickhouse.com)
We took on Gunnar Morling’s One Billion Row Challenge almost exactly a year ago, testing how quickly a 1-billion-row text file could be aggregated.
Adding concurrent read/write to DuckDB with Arrow Flight (definite.app)
We've been thinking a lot about latency, streaming and (near) real-time analytics lately. At Definite, we deal with a lot of data pipelines. In most cases (e.g. ingesting Stripe data), our customers are fine with batch processing (e.g. every hour). But as we've grown, we've seen more and more need for near real-time pipelines (e.g. ingesting events or CDC from Postgres).
Composable SQL (borretti.me)
SQL could be improved somewhat by introducing composable query fragments with statically-typed interfaces. I begin by explaining two areas (testing and reusing business logic) where SQL does very poorly. Then I explain my solution, and how it addresses the problems.
An experiment of adding recommendation engine to your app using pgvector search (silk.us)
The reason I am excited about the latest genAI and vector search developments is that you can “plug in” these new technologies into your existing applications and their data, without having to re-platform or re-engineer your entire stack first. You can keep your existing application code and database design exactly as it is and then just add new genAI & vector search features where it makes sense.
The Mythical IO-Bound Rails App (byroot.github.io)
When the topic of Rails performance comes up, it is commonplace to hear that the database is the bottleneck, so Rails applications are IO-bound anyway, hence Ruby performance doesn’t matter that much, and all you need is a healthy dose of concurrency to make your service scale.
The missing tier for query compilers (scattered-thoughts.net)
Database query engines used to be able to assume that disk latency was so high that the overhead of interpreting the query plan didn't matter. Unfortunately these days a cheap nvme ssd can supply data much faster than a query interpreter can process it.
Database Release and End-to-End Testing: ClickHouse Database Cloning (notion.site)