Hacker News with Generative AI: Testing

Texas' annual reading test adjusted difficulty yearly, masking improvement (theconversation.com)
Texas children’s performance on an annual reading test was basically flat from 2012 to 2021, even as the state spent billions of additional dollars on K-12 education.
Using Postgres pg_test_fsync tool for testing low latency writes (tanelpoder.com)
Here’s a useful tool for quickly testing whether a disk (or a cloud block store volume) is a good candidate for your database WAL/redo logs and any other files that require low latency writes. The pg_test_fsync tool is bundled with standard Postgres packages, so no extra installation is needed. You don’t actually have to use Postgres as your database, this tool’s output is universally valuable for any workload requiring fast writes.
Silencing Firefox's Chattiness for Web App Testing (secureideas.com)
Firefox is one chatty browser! Even if you don’t actually use it for anything, it’s constantly making requests out to the internet for things like updates, checking network status, and sending telemetry data back to Mozilla. When using Firefox for web app testing, I’ve often noticed the constant stream of additional requests that get in the way. It’s possible to ignore them, but it turns out that it’s also really easy to disable most of that traffic.
Mutmut – Python Mutation Tester (github.com/boxed)
Mutmut is a mutation testing system for Python, with a strong focus on ease of use.
Show HN: Rocketship – Open-source E2E testing that's self-hostable (github.com/rocketship-ai)
🚀 Rocketship is an open‑source testing engine that can verify complex, API-driven scenarios that are made by your customers— or your systems.
A Simple Spit Test Could Reveal Prostate Cancer, Outperforming a Blood Test (discovermagazine.com)
A simple spit test that can be taken at home is among the most promising methods for detecting prostate cancer, which an estimated 1 in 8 men are diagnosed with during their lifetime.
Launch HN: Jazzberry (YC X25) – AI agent for finding bugs (ycombinator.com)
We are building Jazzberry (https://jazzberry.ai), an AI bug finder that automatically tests your code when a pull request occurs to find and flag real bugs before they are merged.
Making PyPI's test suite faster (trailofbits.com)
Trail of Bits has collaborated with PyPI for several years to add features and improve security defaults across the Python packaging ecosystem.
Using tests as a debugging tool for logic errors (qodo.ai)
In Java development, logic errors constitute a unique class of defects where code executes flawlessly according to its written instructions while systematically violating business requirements.
Grafana K6 v1.0.0 (github.com/grafana)
After 9 years of iteration and countless community contributions, we're thrilled to announce Grafana k6 v1.0.
Elm Test Distributions (janiczek.cz)
…in which I’ll tell you how you can make sure your property based tests are testing the interesting cases.
Swarm Testing Data Structures (tigerbeetle.com)
We discovered a cute little pattern the other day when refactoring TigerBeetle’s intrusive queue — using Zig’s comptime reflection for exhaustively testing data structure’s public API. Isn’t it cool when your property test fails when you add a new API, because “public API is tested” is one of the properties you test?!
Show HN: Magnitude – open-source, AI-native test framework for web apps (github.com/magnitudedev)
Magnitude: The open source, AI-native testing framework for web apps
AI helped write California bar exam, sparking uproar (arstechnica.com)
On Monday, the State Bar of California revealed that it used AI to develop a portion of multiple-choice questions on its February 2025 bar exam, causing outrage among law school faculty and test takers.
Unpowered SSD endurance investigation finds data loss and performance issues (tomshardware.com)
Antithesis Driven Testing (sqlsync.dev)
I want a test system smart enough to discover the bugs I can’t anticipate.
Streaming Platform for Canadian/American Content (Need Testers) (ycombinator.com)
Test Spies in Haskell (jezenthomas.com)
When testing a web application, you often want to make sure that a certain email would be sent — without actually sending it. How do you test that?
Unpowered SSD endurance investigation finds data loss, performance issues (tomshardware.com)
Scenario: Using Agents to Test Your Agents (github.com/langwatch)
Scenario is a library for testing agents end-to-end as a human would, but without having to manually do it. The automated testing agent covers every single scenario for you.
Ask HN: How to unit test AI responses? (ycombinator.com)
I am tasked to build a customer support chat. The AI should be trained with company docs. How can I be sure the AI will not hallucinate a bad response to a customer?
Local CI. Sign off on your own work (github.com/basecamp)
A GitHub CLI extension for local CI. Run your tests on your own machine and sign off when they pass.
Cargo-mutants:zombie: Inject bugs and see if your tests catch them (github.com/sourcefrog)
cargo-mutants helps you improve your program's quality by finding places where bugs could be inserted without causing any tests to fail.
Try: Test anti-framework via CL Condition System (github.com/melisgl)
Try is an extensible test anti-framework with equal support for interactive and non-interactive workflows.
Setup QEMU Output to Serial Console and Automate Tests with Shell Scripts (2019) (fadeevab.com)
While struggling to automate QEMU guest (communicate and control with the shell scripts), I faced a lot of incomplete, partially working solutions around the Internet. Now, I've got a pretty decent collection of working recipes to tune up a QEMU guest, so I decided to organize all that stuff here, and it could be definitely useful for anyone else.
Pytest for Neovim (github.com/richardhapb)
Testing integrated in neovim with pytest. Include Docker support. This project is in progress, I will be adding more features in the future and I open to contributions.
Deterministic simulation testing for async Rust (s2.dev)
You Don't Have Time Not to Test (medium.com)
Testing isn’t a sunk cost. It’s a compounding return that shapes better code and ultimately accelerates your team.
The right way to do data fixtures in Go (brandur.org)
Every test suite should start early in building a strong convention to generate data fixtures. If it doesn’t, data fixtures will still emerge (they’re that necessary), but in a way that’s poorly designed, with no API (or a poorly designed one), and not standardized.
Our own worst best customer (antithesis.com)
At Antithesis, our job is to break software before it breaks in production – ours included. We’ve spent years stress-testing our systems with property-based testing and deterministic simulation, not just because it makes our software more reliable, but because it actually makes us faster.