Hacker News with Generative AI: Distributed Systems

CRDTs #2: Turtles All the Way Down (jhellerstein.github.io)
Modern distributed systems often seem to rest on an stack of turtles. For every guarantee we make, we seem to rely on a lower-layer assumption. Eventually we're left wondering: what is at the bottom?
CRDTs: Pros and Cons (Lattices and Lettuces?) (jhellerstein.github.io)
Over the next few days, I'm going to post a number of observations about CRDTs: Convergent Replicated Data Types. These are data structures that aspire to help us with coordination-free distributed programming, a topic that interests me a lot. How can developers (or languages/compilers) deliver distributed programs that are safe or correct in important ways, without employing expensive mechanisms for coordination that make the global cloud run as slowly as a sequential computer?
LLM-D: Kubernetes-Native Distributed Inference at Scale (github.com/llm-d)
llm-d is a Kubernetes-native distributed inference serving stack - a well-lit path for anyone to serve large language models at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.
LLM-D: Kubernetes-Native Distributed Inference (llm-d.ai)
llm-d is a Kubernetes-native high-performance distributed LLM inference framework - a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.
The value of model checking in distributed protocols design (protocols-made-fun.com)
Recently, we have published two technical papers on arXiv that are both using model checkers as the main vehicle for verifying properties of fault-tolerant distributed algorithms.
Programming Models for Correct and Modular Distributed Systems (eecs.berkeley.edu)
Distributed systems are a fundamental part of modern computing, but they are notoriously difficult to program.
A lost decade chasing distributed architectures for data analytics? (duckdb.org)
TL;DR: We benchmark DuckDB on a 2012 MacBook Pro to decide: did we lose a decade chasing distributed architectures for data analytics?
Sheepdog - a distributed storage system for QEMU (github.com/sheepdog)
Ask HN: What's your go-to message queue in 2025? (ycombinator.com)
The space is confusing to say the least.<p>Message queues are usually a core part of any distributed architecture, and the options are endless: Kafka, RabbitMQ, NATS, Redis Streams, SQS, ZeroMQ... and then there's the “just use Postgres” camp for simpler use cases.<p>I’m trying to make sense of the tradeoffs between:<p>- async fire-and-forget pub/sub vs. sync RPC-like point to point communication<p>- simple FIFO vs. priority queues and delay queues<p>- intelligent brokers (e.g. RabbitMQ, NATS with filters) vs. minimal brokers (e.g.
Fossil: A Coherent Software Configuration Management System (fossil-scm.org)
Fossil is a simple, high-reliability, distributed SCM system with these advanced features:
FlowG – Distributed Systems without raft (part 2) (medium.com)
Recently, I published the v0.37.0 release of FlowG, a Free and OpenSource low-code log processing software:
Garbage collection of object storage at scale (warpstream.com)
Over the last 10 years, I’ve built several distributed systems on top of object storage, with WarpStream being the most recent.
TScale – Distributed training on consumer GPUs (github.com/Foreseerr)
This repo contains transformer train and inference code written in C++ and CUDA.
Building MapReduce (Based on Google Paper) (ycombinator.com)
I read the MapReduce paper recently and wanted to try out the internal working by building it from scratch (at least a minimal version). Hope it helps someone trying to reproduce the same paper in future
Using only half the outbox pattern (medium.com)
In distributed systems, reliable communication between services cannot be taken for granted. You might update a database record successfully, but if publishing an event to Kafka or RabbitMQ fails immediately after, inconsistencies can appear — issues that may not be visible right away but can cause serious problems later.
Node.js implementation of the BitTorrent DHT protocol (npmjs.com)
Node.js implementation of the BitTorrent DHT protocol. BitTorrent DHT is the main peer discovery layer for BitTorrent, which allows for trackerless torrents. DHTs are awesome!
Sharding Mastodon, Part 1 (pgdog.dev)
Redirecting…
What If We Could Rebuild Kafka from Scratch? (morling.dev)
The last few days I spent some time digging into the recently announced KIP-1150 ("Diskless Kafka"), as well AutoMQ’s Kafka fork, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration.
Ask HN: Has anyone used Riak? Thoughts? (ycombinator.com)
I’ve just stumbled upon RIAK. It seems like a very cool technology. Almost like an alternative to kubernetes. Has anyone used it in production? Why isn’t it more well known? It seems like an awesome solution.
Decomposing Transactional Systems (transactional.blog)
Decomposing Transactional Systems (transactional.blog)
Consistent Hash Ring (selfboot.cn)
Consistent Hashing Ring is a special hashing algorithm primarily used for data distribution and load balancing in distributed systems.
Graham: Synchronizing Clocks by Leveraging Local Clock Properties (usenix.org)
High performance, strongly consistent applications are beginning to require scalable sub-microsecond clock synchronization.
KIP-1150: Diskless Kafka Topics (apache.org)
No results
Erlang's not about lightweight processes and message passing (2023) (stevana.github.io)
I used to think that the big idea of Erlang is its lightweight processes and message passing. Over the last couple of years I’ve realised that there’s a bigger insight to be had, and in this post I’d like to share it with you.
Engineering a Trace Details Page That Handles a Million Spans (signoz.io)
Building a modern durable execution engine from first principles (restate.dev)
We dive into the architecture details of Restate, a Durable Execution engine we built from the ground up. Restate requires no database/log or other system, but implements a full stack that competes with the best logs in terms of durability and operations.
Colossus: How we deliver SSD performance at HDD prices (cloud.google.com)
From YouTube and Gmail to BigQuery and Cloud Storage, almost all of Google’s products depend on Colossus, our foundational distributed storage system.
The Synchrony Budget (morling.dev)
For building a system of distributed services, one concept I think is very valuable to keep in mind is what I call the synchrony budget: as much as possible, a service should minimize the number of synchronous requests which it makes to other services.
Conflict-Free Distributed Architecture for Append-Only Writes to Apache Iceberg (e6data.com)
Apache Iceberg is a cornerstone table format in modern data lakehouse systems. It is renowned for its ability to deliver transactional consistency, schema evolution, and snapshot isolation through a metadata-driven architecture.