Hacker News with Generative AI: Testing

Show HN: Magnitude – open-source, AI-native test framework for web apps (github.com/magnitudedev)
Magnitude: The open source, AI-native testing framework for web apps
AI helped write California bar exam, sparking uproar (arstechnica.com)
On Monday, the State Bar of California revealed that it used AI to develop a portion of multiple-choice questions on its February 2025 bar exam, causing outrage among law school faculty and test takers.
Unpowered SSD endurance investigation finds data loss and performance issues (tomshardware.com)
Antithesis Driven Testing (sqlsync.dev)
I want a test system smart enough to discover the bugs I can’t anticipate.
Streaming Platform for Canadian/American Content (Need Testers) (ycombinator.com)
Test Spies in Haskell (jezenthomas.com)
When testing a web application, you often want to make sure that a certain email would be sent — without actually sending it. How do you test that?
Unpowered SSD endurance investigation finds data loss, performance issues (tomshardware.com)
Scenario: Using Agents to Test Your Agents (github.com/langwatch)
Scenario is a library for testing agents end-to-end as a human would, but without having to manually do it. The automated testing agent covers every single scenario for you.
Ask HN: How to unit test AI responses? (ycombinator.com)
I am tasked to build a customer support chat. The AI should be trained with company docs. How can I be sure the AI will not hallucinate a bad response to a customer?
Local CI. Sign off on your own work (github.com/basecamp)
A GitHub CLI extension for local CI. Run your tests on your own machine and sign off when they pass.
Cargo-mutants:zombie: Inject bugs and see if your tests catch them (github.com/sourcefrog)
cargo-mutants helps you improve your program's quality by finding places where bugs could be inserted without causing any tests to fail.
Try: Test anti-framework via CL Condition System (github.com/melisgl)
Try is an extensible test anti-framework with equal support for interactive and non-interactive workflows.
Setup QEMU Output to Serial Console and Automate Tests with Shell Scripts (2019) (fadeevab.com)
While struggling to automate QEMU guest (communicate and control with the shell scripts), I faced a lot of incomplete, partially working solutions around the Internet. Now, I've got a pretty decent collection of working recipes to tune up a QEMU guest, so I decided to organize all that stuff here, and it could be definitely useful for anyone else.
Pytest for Neovim (github.com/richardhapb)
Testing integrated in neovim with pytest. Include Docker support. This project is in progress, I will be adding more features in the future and I open to contributions.
Deterministic simulation testing for async Rust (s2.dev)
You Don't Have Time Not to Test (medium.com)
Testing isn’t a sunk cost. It’s a compounding return that shapes better code and ultimately accelerates your team.
The right way to do data fixtures in Go (brandur.org)
Every test suite should start early in building a strong convention to generate data fixtures. If it doesn’t, data fixtures will still emerge (they’re that necessary), but in a way that’s poorly designed, with no API (or a poorly designed one), and not standardized.
Our own worst best customer (antithesis.com)
At Antithesis, our job is to break software before it breaks in production – ours included. We’ve spent years stress-testing our systems with property-based testing and deterministic simulation, not just because it makes our software more reliable, but because it actually makes us faster.
Show HN: Donobu – Mac app turns prompts into deterministic browser tests (ycombinator.com)
Hi HN, we’re Vasusen and Justin, and we’re building Donobu (https://www.donobu.com), a Mac desktop app. It turns prompts like “ensure onboarding works” into reliable browser tests, with optional AI (BYOK). It’s local-first, privacy-focused, and built with insights from our Coursera days—where testing hundreds of features across thousands of pages was a nightmare.
Just write a test for it (kobzol.github.io)
This is a short appreciation post about Rust continuously guiding me towards doing The Right Thing™.
Testing Without Mocks: A Pattern Language (2023) (jamesshore.com)
Dead Simple Snapshot Testing in Zig (kristoff.it)
I’ve recently added snapshot testing support to Zine, my static site generator, and I’ll share here how to get a similar setup going for your projects.
Verification-First Development (buttondown.com)
A while back I argued on the Blue Site1 that "test-first development" (TFD) was different than "test-driven development" (TDD). The former is "write tests before you write code", the latter is a paradigm, culture, and collection of norms that's based on TFD. More broadly, TFD is a special case of Verification-First Development and TDD is not.
Testing Begins for Community Notes on Facebook, Instagram and Threads (about.fb.com)
In January, Meta announced that we will end our third party fact checking program and move to a crowd-sourced Community Notes approach, starting in the United States. On March 18th, we will begin testing this new approach by allowing contributors from our community to write and rate notes on content across Facebook, Instagram and Threads.
Show HN: Testeranto – the AI driven test framework for TypeScript projects (npmjs.com)
🚧 WARNING: Testeranto is still under development and is not ready for production yet. 🚧
Problems with New California Bar Exam Enrage Test Takers and Cloud Their Futures (nytimes.com)
Even under normal circumstances, the California bar exam is one final harrowing hurdle before aspiring lawyers can practice. But last week was worse than any other, as they were thrown into limbo by technical glitches, delays and what many said were bizarrely written questions on a revamped test that didn’t match anything in preparation.
Demystifying monads in Rust through property-based testing (sunshowers.io)
In programming pedagogy, monads have a place as a mystical object from the functional programming world that’s hard to understand and even harder to explain.
Ask HN: Have you written or consumed OpenAPI Arazzo specification? (ycombinator.com)
Hi HN,<p>I recently came across the Arazzo specification within the OpenAPI initiative. It defines a standard to express workflows involving multiple APIs. A couple of main use cases here are automatic API invocation with LLMs and automated tests.<p>I want to auto generate API tests based on these specs and would like to see other examples. Is anyone here writing or consuming Arazzo specs?<p>Reference: https://www.openapis.org/arazzo
Mere weeks after Starship's breakup, the vehicle may soon fly again (arstechnica.com)
A little over a month after SpaceX's large Starship launch ended in an explosion over several Caribbean islands, the company is preparing its next rocket for a test flight.
Launch HN: Confident AI (YC W25) – Open-source evaluation framework for LLM apps (ycombinator.com)
Hi HN - we're Jeffrey and Kritin, and we're building Confident AI (https://confident-ai.com). This is the cloud platform for DeepEval (https://github.com/confident-ai/deepeval), our open-source package that helps engineers evaluate and unit-test LLM applications. Think Pytest for LLMs.