Hacker News with Generative AI: Data Storage

Reducing Cloud Spend: Migrating Logs from CloudWatch to Iceberg with Postgres (crunchydata.com)
As a database service provider, we store a number of logs internally to audit and oversee what is happening within our systems.
Scoping a Local-First Image Archive (scottishstoater.com)
For years, I’ve been thinking about how we store and access our digital files, especially photos.
Preview: Amazon S3 Tables and Lakehouse in DuckDB (duckdb.org)
TL;DR: We are happy to announce a new preview feature that adds support for Apache Iceberg REST Catalogs, enabling DuckDB users to connect to Amazon S3 Tables and Amazon SageMaker Lakehouse with ease.
The real failure rate of EBS (planetscale.com)
PlanetScale has deployed millions of Amazon Elastic Block Store (EBS) volumes across the world. We create and destroy tens of thousands of them every day as we stand up databases for customers, take backups, and test our systems end-to-end. Through this experience, we have an unique viewpoint into the failure rate and mechanisms of EBS, and have spent a lot of time working on how to mitigate them.
Archival Storage (dshr.org)
I'm honored to appear in what I believe is the final series of these seminars. Most of my previous appearances have focused on debunking some conventional wisdom, and this one is no exception. My parting gift to you is to stop you wasting time and resources on yet another seductive but impractical idea — that the solution to storing archival data is quasi-immortal media. As usual, you don't have to take notes.
Theory crafting a system for 1000 simultaneous micro SD card ingests (level1techs.com)
Ask HN: What do you think of BDXL (100GB disks)? (ycombinator.com)
I still have a need to archive data and I'm thinking about getting a BDXL writer and some disks. Is this a dumb thing to do in 2025?
Put a data center on the moon? (ieee.org)
Lonestar Data Holdings is sending a test mission, aiming to safeguard valuable data
Hard Drive Graveyard (benjdd.com)
Hard drive graveyard
What 5 Megabytes of Data Looked Like in 1966 (62,500 punched cards) (vintag.es)
In 1966, computing was in its infancy, and the concept of data storage and processing looked drastically different from today’s instant access to vast amounts of information.
Are SSDs more reliable than hard drives? (2021) (backblaze.com)
Solid-state drives (SSDs) continue to become more and more a part of the data storage landscape. And while our SSD 101 series has covered topics like upgrading, troubleshooting, and recycling your SSDs, we’d like to test one of the more popular declarations from SSD proponents: that SSDs fail much less often than our old friend, the hard disk drive (HDD).
12 years of Backblaze data center storage drives, visualized (benjdd.com)
1 small node -> 100 drives
Backblaze Drive Stats for 2024 (backblaze.com)
As of December 31, 2024, we had 305,180 drives under management. Of that number, there were 4,060 boot drives and 301,120 data drives. This report will focus on those data drives as we review the Q4 2024 annualized failure rates (AFR), the 2024 failure rates, and the lifetime failure rates for the drive models in service as of the end of 2024.
Seagate's HDD scandal deepens clues point at Chinese Chia mining farms (tomshardware.com)
Cloudflare R2 Incident on February 6, 2025 (cloudflare.com)
Multiple Cloudflare services, including our R2 object storage, were unavailable for 59 minutes on Thursday, February 6th. This caused all operations against R2 to fail for the duration of the incident, and caused a number of other Cloudflare services that depend on R2 — including Stream, Images, Cache Reserve, Vectorize and Log Delivery — to suffer significant failures.
For privacy: Change of our refund policy from 30 to 14 days (mullvad.net)
As part of our ongoing commitment to storing less user data and protect your privacy, we’re updating our refund policy.
Husky: Efficient Compaction at Datadog Scale (datadoghq.com)
In a previous blog post, we introduced our Husky event store system. Husky is a distributed storage system that is layered over object storage (e.g., Amazon S3, Google Cloud Storage, Azure Blob Storage, etc.), with the query system acting as a cache over this storage. We also did a deep dive into Husky’s ingestion pipelines that we built to handle the scale of our customer data. In this post, we’ll cover how we designed Husky’s underlying data storage layer.
Apache Accumulo 4.0 Feature Preview (apache.org)
Apache Accumulo® is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval.
Storage is cheap, but not thinking about logging is expensive (counting-stuff.com)
The bad habits of data over-collection run deep.
Seagate smashes largest HDD world record with 36TB hard drive (techradar.com)
Parquet and ORC's many shortfalls for machine learning, and what to do about it? (starburst.io)
At the turn of the century (around a quarter of a decade ago), over 99% of the data management industry used row-oriented storage to store data for all workloads involving structured data — including transactional and analytical workloads.
37signals Dev – Monitoring 10 Petabytes of Data in Pure Storage (37signals.com)
How we use Prometheus to have metrics and alerts for Pure Storage.
I Track My Health Data in Markdown: Lessons in Digital Longevity (ycombinator.com)
I’ve spent years tracking my sleep, diet, and exercise with apps and wearables. But here’s the problem: when an app gets discontinued or stops syncing, the data—and all the insights—disappear.
Century-Scale Storage (law.harvard.edu)
This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century? There are plenty of worthy related subjects and discourses that this piece does not touch at all.
Century Scale Storage (law.harvard.edu)
This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century?
Internet Object – New Age Data Serialization After JSON (internetobject.org)
Revolutionize your data exchange and storage with a format that's built for efficiency, clarity and reliability. A Text Based Data Serialization and Structured Storage Format Beyond JSON!
Terabit-scale high-fidelity diamond data storage (nature.com)
In the era of digital information, realizing efficient and durable data storage solutions is paramount.
Hetzner Object Storage (hetzner.com)
Object Storage is the S3 compatible storage solution that grows with your data requirements - highly available, secure and flexible.
Big Endian's Guide to SQLite Storage (jabid.in)
I wanted to learn how databases like SQLite store data under the hood, so I decided to write some code to inspect the database file. SQLite famously stores the entire database in a single file, and the file format is very well documented. Here is one diagram1 to get started instead of the roughly 13,848 words in that document.
Hide Photos on Floppies with a Flux Imager (github.com/dbalsom)