Hacker News with Generative AI: Database

Asserting Implications (tigerbeetle.com)
A short one today!

Database, Software, Open Source

10 points by todsacerdoti 185 days ago | 0 comments

Gmail to SQLite (github.com/marcboeker)
This is a script to download emails from Gmail and store them in a SQLite database for further analysis.

Email, Database, Python, Software, Data Analysis

338 points by tehlike 201 days ago | 114 comments

A PostgreSQL planner semi-join gotcha with CTE, LIMIT, and RETURNING (shayon.dev)
I recently discovered an unexpected behavior in PostgreSQL involving a pattern of using a Common Table Expression (CTE) with DELETE ... RETURNING and LIMIT to process a batch of items from a queue-like table. What seemed straightforward turned out to have a surprising interaction with the query planner.

PostgreSQL, Database, SQL, Query Planning, Performance

57 points by namanyayg 207 days ago | 50 comments

RDS Reserved Instances: Where Did All the New Instance Types Go? (duckbillgroup.com)
AWS is excluding newer RDS instance types from Reserved Instance purchases. Is this oversight or the quiet continuation of their RI deprecation strategy?

AWS, Cloud Computing, Reserved Instances, Database, Pricing

9 points by mooreds 209 days ago | 0 comments

Dragonfly Is Not Redis: An Open Letter to the Community (dragonflydb.io)
Modern architecture, better performance, lower cost, and built for today’s needs and future scale. Dragonfly is not Redis.

Database, Open Source, Performance, Technology

12 points by badmonster 209 days ago | 0 comments

Beyond Performance: Measuring the environmental impact of analytical databases (arxiv.org)
The exponential growth of data is making query processing increasingly critical for modern computing infrastructure, yet the environmental impact of database operations remains poorly understood and largely overlooked.

Environmental Impact, Database, Performance, Data Management

17 points by samaysharma 211 days ago | 4 comments

Jepsen: Amazon RDS for PostgreSQL 17.4 (jepsen.io)
Amazon RDS for PostgreSQL is an Amazon Web Services (AWS) service which provides managed instances of the PostgreSQL database. We show that Amazon RDS for PostgreSQL multi-AZ clusters violate Snapshot Isolation, the strongest consistency model supported across all endpoints. Healthy clusters occasionally allow Long Fork and other G-nonadjacent cycles. These phenomena occurred in every version tested, from 13.15 to 17.4. Amazon RDS for PostgreSQL may instead provide Parallel Snapshot Isolation.

Database, Cloud Computing, PostgreSQL, Consistency Models

608 points by aphyr 212 days ago | 146 comments

Beyond Elk: Lightweight and Scalable Cloud-Native Log Monitoring (greptime.com)
This article explores the growing limitations of the ELK stack in modern log storage scenarios and introduces GreptimeDB as a next-generation log database with advantages in both architecture and user experience.

Cloud Computing, Log Management, Database, Software

25 points by xzhuang1984 212 days ago | 26 comments

Wikipedia: Database Download (wikipedia.org)
Wikipedia offers free copies of all available content to interested users. These databases can be used for mirroring, personal use, informal backups, offline use or database queries (such as for Wikipedia:Maintenance). All text content is licensed under the Creative Commons Attribution-ShareAlike 4.0 License (CC-BY-SA), and most is additionally licensed under the GNU Free Documentation License (GFDL). [1] Images and other files are available under different terms, as detailed on their description pages.

Wikipedia, Database, Open Source, Licensing, Content

203 points by doener 214 days ago | 99 comments

Abusing DuckDB-WASM by making SQL draw 3D graphics (Sort Of) (hey.earth)
I had this slightly crazy idea: Could I ditch most of the conventional JavaScript game loop and rendering logic and build a 3D game engine where SQL queries did the heavy lifting?

Database, Graphics, JavaScript, WebAssembly, Game Development

200 points by tanelpoder 219 days ago | 32 comments

Everything You Need to Know About Incremental View Maintenance (materializedview.io)
Incremental view maintenance has been a hot topic lately.

Database, Data Management, Optimization, Performance

9 points by riccomini 222 days ago | 6 comments

New Apache Cassandra Release Saves 400% IOPS (simplyblock.io)
On April 10, 2025, the Apache Software Foundation released version 5.0.4 of Apache Cassandra, bringing significant performance optimizations for all users—but especially for those relying on remotely attached storage like Amazon EBS. The standout feature in this release is an overhaul of the compaction algorithm aimed at slashing IOPS usage while increasing overall throughput.

Database, Apache Cassandra, Performance Optimization, Storage

5 points by noctarius 227 days ago | 0 comments

Faster interpreters in Go: Catching up with C++ (planetscale.com)
The SQL evaluation engine that ships with Vitess, the open-source database that powers PlanetScale, was originally implemented as an AST evaluator that used to operate directly on the SQL AST generated by our parser. Over this past year, we've gradually replaced it with a Virtual Machine which, despite being written natively in Go, performs similarly to the original C++ evaluation code in MySQL.

Go, Programming Languages, Performance, Database, Open Source

252 points by ksec 235 days ago | 56 comments

Show HN: AGX – Open-Source Data Exploration for ClickHouse (The New Standard?) (github.com/agnosticeng)
agx is a desktop application built with Tauri and SvelteKit that provides a modern interface for exploring and querying data using ClickHouse's embedded database engine (chdb).

Open Source, Data Exploration, ClickHouse, Database, Desktop Applications

48 points by didierfranc 252 days ago | 3 comments

Preview: Amazon S3 Tables and Lakehouse in DuckDB (duckdb.org)
TL;DR: We are happy to announce a new preview feature that adds support for Apache Iceberg REST Catalogs, enabling DuckDB users to connect to Amazon S3 Tables and Amazon SageMaker Lakehouse with ease.

Database, Data Storage, Cloud Computing, Preview

177 points by hn1986 253 days ago | 47 comments

Ask HN: Alternatives to Vector DB? (ycombinator.com)
A while back I was looking for a vector database that would work across Windows / Mac / Linux platforms. Some of the options required specific processors like Intel. I am curious if there are any alternatives to a Vector DB that can run cross platform and are easy to setup?

Database, Cross-Platform, Alternatives, Open Source, Vector Databases

21 points by tmaly 253 days ago | 37 comments

Selective async commits in PostgreSQL – balancing durability and performance (shayon.dev)
I was recently looking into some workloads that generate a lot of I/O and CPU contention on some very high-write code paths and came across synchronous_commit (https://www.postgresql.org/docs/current/wal-async-commit.html). It can be very tempting to turn this off globally because the performance gains in terms of I/O, CPU, and TPS (transactions per second) are very hard to overlook. I noticed I/O completely gone, CPU down 20% (at peak), and a 30% increase in TPS.

Database, Performance, PostgreSQL

109 points by shayonj 255 days ago | 37 comments

Nubmq: A high performant key-value store engine built from first principles (ycombinator.com)
Nubmq: A high performant key-value store engine built from first principles

Database, Key-Value Stores, Performance

39 points by nubskr 257 days ago | 18 comments

What if we just didn't decompress it? (spiraldb.com)
Vortex is unique in the way it evaluates filter and projection expressions by supporting full compute push-down, in many cases avoiding decompression entirely.

Database, Compression, Optimization, Performance

12 points by gatesn 266 days ago | 2 comments

An Experimental Study of Bitmap Compression vs. Inverted List Compression (dl.acm.org)
Bitmap compression has been studied extensively in the database area and many efficient compression schemes were proposed, e.g., BBC, WAH, EWAH, and Roaring. Inverted list compression is also a well-studied topic in the information retrieval community and many inverted list compression algorithms were developed as well, e.g., VB, PforDelta, GroupVB, Simple8b, and SIMDPforDelta.

Database, Information Retrieval, Compression, Algorithms, Experimental Studies

32 points by westurner 271 days ago | 6 comments

EdgeDB is now Gel and Postgres is the future (geldata.com)
We have news! EdgeDB is rebranding as Gel, more on that below.

Database, Software, Rebranding

165 points by mmastrac 274 days ago | 123 comments

Do It Yourself Database CDN with Embedded Replicas (turso.tech)
Imagine you have a user in Singapore, and your database is in the US. Every time the user makes a request, it has to travel halfway around the world, which can lead to high latency and poor performance.

Database, Performance, Cloud, Networking, CDN

19 points by emschwartz 288 days ago | 2 comments

Show HN: OLake[open source] Fastest database to Iceberg data replication tool (ycombinator.com)
Hi HN,<p>Today we’re excited to introduce OLake (github.com/datazip-inc/olake, 130+ and growing fast), an open-source tool built to help you replicate Database (MongoDB, for now, mysql and postgres under development) data into Data Lakehouse at faster speed without any hassle of managing Debezium or kafka (at least 10x faster than Airbyte and Fivetran at fraction of the cost, refer docs for benchmarks - https://olake.io/docs/connectors/mongodb/benchmarks).

Open Source, Data Replication, Database, Data Lakehouse, Benchmarking

14 points by pkhodiyar 289 days ago | 3 comments

SQL pipe syntax available in public preview in BigQuery (cloud.google.com)
Pipe query syntax is an extension to GoogleSQL that supports a linear query structure designed to make your queries easier to read, write, and maintain. You can use pipe syntax anywhere you write GoogleSQL.

SQL, BigQuery, Cloud Computing, Programming Languages, Database

247 points by marcyb5st 290 days ago | 113 comments

A New Postgres Block Storage Layout for Full Text Search (paradedb.com)

Database, PostgreSQL

13 points by craigkerstiens 297 days ago | 1 comments

One Billion JSON Challenge (clickhouse.com)
We took on Gunnar Morling’s One Billion Row Challenge almost exactly a year ago, testing how quickly a 1-billion-row text file could be aggregated.

Database, Performance, Data Analysis

15 points by pradeepchhetri 300 days ago | 0 comments

Adding concurrent read/write to DuckDB with Arrow Flight (definite.app)
We've been thinking a lot about latency, streaming and (near) real-time analytics lately. At Definite, we deal with a lot of data pipelines. In most cases (e.g. ingesting Stripe data), our customers are fine with batch processing (e.g. every hour). But as we've grown, we've seen more and more need for near real-time pipelines (e.g. ingesting events or CDC from Postgres).

Database, Data Pipelines, Real-Time Analytics, Streaming Data, Arrow Flight

94 points by mritchie712 302 days ago | 32 comments

Composable SQL (borretti.me)
SQL could be improved somewhat by introducing composable query fragments with statically-typed interfaces. I begin by explaining two areas (testing and reusing business logic) where SQL does very poorly. Then I explain my solution, and how it addresses the problems.

SQL, Programming, Database, Testing, Business Logic

271 points by earnestinger 305 days ago | 143 comments

An experiment of adding recommendation engine to your app using pgvector search (silk.us)
The reason I am excited about the latest genAI and vector search developments is that you can “plug in” these new technologies into your existing applications and their data, without having to re-platform or re-engineer your entire stack first. You can keep your existing application code and database design exactly as it is and then just add new genAI & vector search features where it makes sense.

Generative AI, Vector Search, Application Development, Database, Recommendation Engines

76 points by tanelpoder 308 days ago | 7 comments

The Mythical IO-Bound Rails App (byroot.github.io)
When the topic of Rails performance comes up, it is commonplace to hear that the database is the bottleneck, so Rails applications are IO-bound anyway, hence Ruby performance doesn’t matter that much, and all you need is a healthy dose of concurrency to make your service scale.

Rails, Performance Optimization, Database, Ruby, Concurrency

8 points by todsacerdoti 308 days ago | 2 comments