Hacker News with Generative AI: Distributed Systems

rqlite turns 10: Lessons from a decade building Distributed Systems (philipotoole.com)
rqlite is a lightweight, open-source, distributed relational database written in Go, which uses SQLite as its storage engine and Raft for consensus.
Making Postgres Distributed with FoundationDB (fabianlindfors.se)
Turning the revered Postgres into a distributed database is a tall order but not a new idea.
CRDTs #2: Turtles All the Way Down (jhellerstein.github.io)
Modern distributed systems often seem to rest on an stack of turtles. For every guarantee we make, we seem to rely on a lower-layer assumption. Eventually we're left wondering: what is at the bottom?
CRDTs: Pros and Cons (Lattices and Lettuces?) (jhellerstein.github.io)
Over the next few days, I'm going to post a number of observations about CRDTs: Convergent Replicated Data Types. These are data structures that aspire to help us with coordination-free distributed programming, a topic that interests me a lot. How can developers (or languages/compilers) deliver distributed programs that are safe or correct in important ways, without employing expensive mechanisms for coordination that make the global cloud run as slowly as a sequential computer?
From RPC to transactions and durable executions (pramodb.com)
I spent some time reading about “Durable Execution Engines” (eg: Temporal) and explored possible connections to earlier concepts like database transactions, distributed transactions, and building RPC/Microservice based systems in a fault tolerant manner. In this post I’ll try to summarize some of my learnings. How useful it is will depend on how much of this you already know! Among other things, I relied on these great overviews: The Modern Transactional Stack (by some a16z folks) and What is Durable Execution?.
LLM-D: Kubernetes-Native Distributed Inference at Scale (github.com/llm-d)
llm-d is a Kubernetes-native distributed inference serving stack - a well-lit path for anyone to serve large language models at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.
LLM-D: Kubernetes-Native Distributed Inference (llm-d.ai)
llm-d is a Kubernetes-native high-performance distributed LLM inference framework - a well-lit path for anyone to serve at scale, with the fastest time-to-value and competitive performance per dollar for most models across most hardware accelerators.
The value of model checking in distributed protocols design (protocols-made-fun.com)
Recently, we have published two technical papers on arXiv that are both using model checkers as the main vehicle for verifying properties of fault-tolerant distributed algorithms.
Programming Models for Correct and Modular Distributed Systems (eecs.berkeley.edu)
Distributed systems are a fundamental part of modern computing, but they are notoriously difficult to program.
A lost decade chasing distributed architectures for data analytics? (duckdb.org)
TL;DR: We benchmark DuckDB on a 2012 MacBook Pro to decide: did we lose a decade chasing distributed architectures for data analytics?
Sheepdog - a distributed storage system for QEMU (github.com/sheepdog)
Ask HN: What's your go-to message queue in 2025? (ycombinator.com)
The space is confusing to say the least.<p>Message queues are usually a core part of any distributed architecture, and the options are endless: Kafka, RabbitMQ, NATS, Redis Streams, SQS, ZeroMQ... and then there's the “just use Postgres” camp for simpler use cases.<p>I’m trying to make sense of the tradeoffs between:<p>- async fire-and-forget pub/sub vs. sync RPC-like point to point communication<p>- simple FIFO vs. priority queues and delay queues<p>- intelligent brokers (e.g. RabbitMQ, NATS with filters) vs. minimal brokers (e.g.
Fossil: A Coherent Software Configuration Management System (fossil-scm.org)
Fossil is a simple, high-reliability, distributed SCM system with these advanced features:
FlowG – Distributed Systems without raft (part 2) (medium.com)
Recently, I published the v0.37.0 release of FlowG, a Free and OpenSource low-code log processing software:
Garbage collection of object storage at scale (warpstream.com)
Over the last 10 years, I’ve built several distributed systems on top of object storage, with WarpStream being the most recent.
TScale – Distributed training on consumer GPUs (github.com/Foreseerr)
This repo contains transformer train and inference code written in C++ and CUDA.
Building MapReduce (Based on Google Paper) (ycombinator.com)
I read the MapReduce paper recently and wanted to try out the internal working by building it from scratch (at least a minimal version). Hope it helps someone trying to reproduce the same paper in future
Using only half the outbox pattern (medium.com)
In distributed systems, reliable communication between services cannot be taken for granted. You might update a database record successfully, but if publishing an event to Kafka or RabbitMQ fails immediately after, inconsistencies can appear — issues that may not be visible right away but can cause serious problems later.
Node.js implementation of the BitTorrent DHT protocol (npmjs.com)
Node.js implementation of the BitTorrent DHT protocol. BitTorrent DHT is the main peer discovery layer for BitTorrent, which allows for trackerless torrents. DHTs are awesome!
Sharding Mastodon, Part 1 (pgdog.dev)
Redirecting…
What If We Could Rebuild Kafka from Scratch? (morling.dev)
The last few days I spent some time digging into the recently announced KIP-1150 ("Diskless Kafka"), as well AutoMQ’s Kafka fork, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration.
Ask HN: Has anyone used Riak? Thoughts? (ycombinator.com)
I’ve just stumbled upon RIAK. It seems like a very cool technology. Almost like an alternative to kubernetes. Has anyone used it in production? Why isn’t it more well known? It seems like an awesome solution.
Decomposing Transactional Systems (transactional.blog)
Decomposing Transactional Systems (transactional.blog)
Consistent Hash Ring (selfboot.cn)
Consistent Hashing Ring is a special hashing algorithm primarily used for data distribution and load balancing in distributed systems.
Graham: Synchronizing Clocks by Leveraging Local Clock Properties (usenix.org)
High performance, strongly consistent applications are beginning to require scalable sub-microsecond clock synchronization.
KIP-1150: Diskless Kafka Topics (apache.org)
No results
Erlang's not about lightweight processes and message passing (2023) (stevana.github.io)
I used to think that the big idea of Erlang is its lightweight processes and message passing. Over the last couple of years I’ve realised that there’s a bigger insight to be had, and in this post I’d like to share it with you.
Engineering a Trace Details Page That Handles a Million Spans (signoz.io)
Building a modern durable execution engine from first principles (restate.dev)
We dive into the architecture details of Restate, a Durable Execution engine we built from the ground up. Restate requires no database/log or other system, but implements a full stack that competes with the best logs in terms of durability and operations.