Hacker News with Generative AI: Performance

Qt 6.9 Released (qt.io)
Qt 6.9 is now available, with new features and improvements for application developers and device creators! As some of the highlights, upgrading to Qt 6.9 brings emoji rendering in existing applications up to the latest standards, order independent transparency for Qt Quick 3D, significant enhancements to address modern OAuth2 requirements, and multiple new performance features across all platforms and devices.
Dual RTX 5090 Beats $25,000 H100 in Real-World LLM Performance (hardware-corner.net)
AI enthusiasts looking for top-tier performance in local LLMs have long considered NVIDIA’s H100 to be the gold standard for inference, thanks to its high-bandwidth HBM3 memory and optimized tensor cores. However, recent benchmarks show that a dual RTX 5090 setup, while still pricey, outperforms the H100 in sustained output token generation, making it an ideal choice for those seeking the best possible performance for home use, especially for models up to 70B parameters.
Show HN: Nue – Apps lighter than a React button (nuejs.org)
On this release we're showing what you can do by taking the modern web standards — HTML, CSS, and JS — to their absolute peak:
Go Optimization Guide (goperf.dev)
Valkey – v8.1.0 GA (github.com/valkey-io)
Upgrade urgency LOW: This is the first release of Valkey 8.1, a minor version update designed to further enhance performance, reliability, observability and usability over Valkey 8.0 for all Valkey installations. This release is fully compatible with all previous Valkey releases as well as Redis OSS 7.2.4.
Show HN: JavaScript PubSub in 163 Bytes (github.com/hassanshaikley)
The smallest PubSub library possible. Zero Dependencies. 149 bytes.
IO_uring Network Zero-Copy Receive Lands in Linux 6.15 (phoronix.com)
IO_uring continues maturing while being one of the greatest innovations within the Linux kernel in the past number of years. With Linux 6.15, IO_uring is getting even more interesting with introducing network zero-copy receive support. With this new code a 200G link could be saturated off a single CPU core in a recent demonstration.
Span<T>.SequenceEquals is faster than memcmp (richardcocks.github.io)
In this post I look at improvements in .NET and using Span for performance and portability.
Minimal CSS-only blurry image placeholders (leanrada.com)
Here’s a CSS technique that produces blurry image placeholders (LQIPs) without cluttering up your markup — Only a single custom property needed!
Towards fearless SIMD, 7 years later (linebender.org)
Seven years ago I wrote a blog post Towards fearless SIMD, outlining a vision for Rust as a compelling language for writing fast SIMD programs. Where are we now?
HTTP/2 zero latency write coalescing (nitely.github.io)
Write coalescing is an I/O optimization technique where multiple small writes are merged into a single larger write before sending data to the underlying system. In Http/2, we can batch multiple frames from one or more streams and send them all at once. This reduces the number of syscalls, and avoids sending tiny TCP packets under load.
Calculate Throughput with LLVM's Scheduling Model (myhsu.xyz)
Compiler, uArch, and a little bit of...jigsaw puzzle?
LLVM Back End for MoonBit (moonbitlang.com)
In the past two years, MoonBit has demonstrated significant performance advantages across WebAssembly, JavaScript, and native backends. We believe that for a new language to be truly valuable, it must offer a generational leap in both core performance and developer experience. Today, we are excited to introduce MoonBit LLVM backend support with 8× the performance of Java in the FFT benchmark, and out-of-the-box debugging.
Method dispatch mechanisms in Swift: static and dynamic dispatch (nilcoalescing.com)
Method dispatch refers to the process of determining which method implementation to execute when a method is called. In Swift, this can be either dynamic or static, each with distinct implications for performance and flexibility.
PostgreSQL Lands Initial Support for IO_uring: "Can Be Considerably Faster" (phoronix.com)
As a very exciting improvement for the open-source PostgreSQL database server, it has merged initial support for making use of IO_uring on Linux servers for asynchronous I/O and can provide for some nice performance improvements.
Servo vs. Ladybird (thelibre.news)
I believe that Ladybird has more funding and better support for the web, but Servo wins in performance. Though, they're hard to compare directly!
Linux kernel 6.14 is a big leap forward in performance and Windows compatibility (zdnet.com)
I save cloud costs by hosting local AI (autonomous.ai)
Meet Vanta. Powered by RTX 4090, delivers up to 1.32 petaflop of AI performance in a power-efficient, and futuristic form factor.
Land ahoy: leaving the Sea of Nodes (v8.dev)
V8’s end-tier optimizing compiler, Turbofan, is famously one of the few large-scale production compilers to use Sea of Nodes (SoN). However, since almost 3 years ago, we’ve started to get rid of Sea of Nodes and fall back to a more traditional Control-Flow Graph (CFG) Intermediate Representation (IR), which we named Turboshaft.
Hann: A Fast Approximate Nearest Neighbor Search Library for Go (github.com/habedi)
Hann is a high-performance approximate nearest neighbor search (ANN) library for Go.
JEP Draft: JFR Method Timing and Tracing (openjdk.org)
Extend JDK Flight Recorder (JFR) to support bytecode-based method timing and tracing for quick and easy use.
Why PostgreSQL needs a better API for alternative table engines? (orioledb.com)
For a long time now, PostgreSQL has had an extensible Index Access Method API (called AM), which has stood the test of time and enabled numerous robust extensions to provide their own index types. For example: rum, pgvector, bloom, zombodb and others. PostgreSQL 12 introduced the Table AM API, promising equivalent flexibility for table access methods.
Exploring Ruby Ractors – I paid for for 10 cores I'm gonna use 10 cores (jpterry.com)
“I paid for 10 cores, I’m gonna use 10 cores!”
A Linux laptop with a brilliant display and performance (zdnet.com)
For a laptop with Linux pre-installed, the Tuxedo Computers Infinity Book Pro 14 (Gen 9) offers a stunning display and strong performance.
Shift-to-Middle Array: A Faster Alternative to Std:Deque? (github.com/attilatorda)
The Shift-To-Middle Array is a dynamic array designed to optimize insertions and deletions at both ends, offering a high-performance alternative to std::deque, std::vector, and linked lists.
RDNA 4's “Out-of-Order” Memory Accesses (chipsandcheese.com)
AMD's RDNA 4 brings a variety of memory subsystem enhancements. Among those, one slide stood out because it dealt with out-of-order memory accesses. According to the slide, RDNA 4 allows requests from different shaders to be satisfied out-of-order, and adds new out-of-order queues for memory requests.
Engineering a Trace Details Page That Handles a Million Spans (signoz.io)
Polypane, The browser for ambitious web developers (polypane.app)
A desktop browser with all the tools you need to build responsive, accessible and performant sites.
MySQL transactions per second vs. fsyncs per second (2020) (sirupsen.com)
How many transactions (‘writes’) per second is MySQL capable of?
Linux 6.14 Sees Last Minute Fix for a Two Year Old Regression 30% Perf Drop (phoronix.com)
Submitted today ahead of the Linux 6.14 stable release expected Sunday is a lone scheduler fix for the kernel.