Hacker News with Generative AI: Data Storage

What 5 Megabytes of Data Looked Like in 1966 (62,500 punched cards) (vintag.es)
In 1966, computing was in its infancy, and the concept of data storage and processing looked drastically different from today’s instant access to vast amounts of information.
Are SSDs more reliable than hard drives? (2021) (backblaze.com)
Solid-state drives (SSDs) continue to become more and more a part of the data storage landscape. And while our SSD 101 series has covered topics like upgrading, troubleshooting, and recycling your SSDs, we’d like to test one of the more popular declarations from SSD proponents: that SSDs fail much less often than our old friend, the hard disk drive (HDD).
Backblaze Drive Stats for 2024 (backblaze.com)
As of December 31, 2024, we had 305,180 drives under management. Of that number, there were 4,060 boot drives and 301,120 data drives. This report will focus on those data drives as we review the Q4 2024 annualized failure rates (AFR), the 2024 failure rates, and the lifetime failure rates for the drive models in service as of the end of 2024.
Seagate's HDD scandal deepens clues point at Chinese Chia mining farms (tomshardware.com)
Cloudflare R2 Incident on February 6, 2025 (cloudflare.com)
Multiple Cloudflare services, including our R2 object storage, were unavailable for 59 minutes on Thursday, February 6th. This caused all operations against R2 to fail for the duration of the incident, and caused a number of other Cloudflare services that depend on R2 — including Stream, Images, Cache Reserve, Vectorize and Log Delivery — to suffer significant failures.
For privacy: Change of our refund policy from 30 to 14 days (mullvad.net)
As part of our ongoing commitment to storing less user data and protect your privacy, we’re updating our refund policy.
Husky: Efficient Compaction at Datadog Scale (datadoghq.com)
In a previous blog post, we introduced our Husky event store system. Husky is a distributed storage system that is layered over object storage (e.g., Amazon S3, Google Cloud Storage, Azure Blob Storage, etc.), with the query system acting as a cache over this storage. We also did a deep dive into Husky’s ingestion pipelines that we built to handle the scale of our customer data. In this post, we’ll cover how we designed Husky’s underlying data storage layer.
Apache Accumulo 4.0 Feature Preview (apache.org)
Apache Accumulo® is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval.
Storage is cheap, but not thinking about logging is expensive (counting-stuff.com)
The bad habits of data over-collection run deep.
Seagate smashes largest HDD world record with 36TB hard drive (techradar.com)
Parquet and ORC's many shortfalls for machine learning, and what to do about it? (starburst.io)
At the turn of the century (around a quarter of a decade ago), over 99% of the data management industry used row-oriented storage to store data for all workloads involving structured data — including transactional and analytical workloads.
37signals Dev – Monitoring 10 Petabytes of Data in Pure Storage (37signals.com)
How we use Prometheus to have metrics and alerts for Pure Storage.
I Track My Health Data in Markdown: Lessons in Digital Longevity (ycombinator.com)
I’ve spent years tracking my sleep, diet, and exercise with apps and wearables. But here’s the problem: when an app gets discontinued or stops syncing, the data—and all the insights—disappear.
Century-Scale Storage (law.harvard.edu)
This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century? There are plenty of worthy related subjects and discourses that this piece does not touch at all.
Century Scale Storage (law.harvard.edu)
This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century?
Internet Object – New Age Data Serialization After JSON (internetobject.org)
Revolutionize your data exchange and storage with a format that's built for efficiency, clarity and reliability. A Text Based Data Serialization and Structured Storage Format Beyond JSON!
Terabit-scale high-fidelity diamond data storage (nature.com)
In the era of digital information, realizing efficient and durable data storage solutions is paramount.
Hetzner Object Storage (hetzner.com)
Object Storage is the S3 compatible storage solution that grows with your data requirements - highly available, secure and flexible.
Big Endian's Guide to SQLite Storage (jabid.in)
I wanted to learn how databases like SQLite store data under the hood, so I decided to write some code to inspect the database file. SQLite famously stores the entire database in a single file, and the file format is very well documented. Here is one diagram1 to get started instead of the roughly 13,848 words in that document.
Hide Photos on Floppies with a Flux Imager (github.com/dbalsom)
Chinese researchers indicate diamonds can store data for millions of years (readwrite.com)
Research has suggested that diamond-based storage technology could preserve vast amounts of information for up to millions of years.
Amazon S3 Adds Put-If-Match (Compare-and-Swap) (amazon.com)
Amazon S3 can now perform conditional writes that evaluate if an object is unmodified before updating it.
Transposing Tensor Files (mmapped.blog)
The safetensors library from Huggingface is popular for representing tensors on disk, and its data layout is fully compatible with the onnx raw tensor data format.
Amazon S3 now supports the ability to append data to an object (amazon.com)
Amazon S3 Express One Zone now supports the ability to append data to an object.
Huawei developing SSD-tape hybrid amid US tech restrictions (blocksandfiles.com)
Huawei’s in-house development of Magneto-Electric Disk (MED) archive storage technology combines an SSD with a Huawei-developed tape drive to provide warm (nearline) and cold data storage.
Transactional Object Storage? (mbrt.dev)
I was frustrated by the gap between stateless and stateful applications in the cloud. While I could easily spin up a stateless application as a “serverless” function in any major cloud provider and pretty much forget about it, persisting data between requests was a game of pick two among three: cheap, strongly consistent, portable.
Upspin: A framework for naming everyone's everything (upspin.io)
Upspin is an attempt to address problems like these, and many more.
Backblaze Drive Stats for Q3 2024 (backblaze.com)
As of the end of Q3 2024, Backblaze was monitoring 292,647 hard disk drives (HDDs) and solid state drives (SSDs) in our cloud storage servers located in our data centers around the world.
Floppy Disk Storage (history) (ibm.com)
The once-ubiquitous data storage device gave rise to the modern software industry
Show HN: OpenDAL, one API to access all the storages (S3, Azblob, HDFS, etc.) (github.com/apache)
Apache OpenDAL™: Access Data Freely