Hacker News with Generative AI: Testing

The right way to do data fixtures in Go (brandur.org)
Every test suite should start early in building a strong convention to generate data fixtures. If it doesn’t, data fixtures will still emerge (they’re that necessary), but in a way that’s poorly designed, with no API (or a poorly designed one), and not standardized.
Our own worst best customer (antithesis.com)
At Antithesis, our job is to break software before it breaks in production – ours included. We’ve spent years stress-testing our systems with property-based testing and deterministic simulation, not just because it makes our software more reliable, but because it actually makes us faster.
Show HN: Donobu – Mac app turns prompts into deterministic browser tests (ycombinator.com)
Hi HN, we’re Vasusen and Justin, and we’re building Donobu (https://www.donobu.com), a Mac desktop app. It turns prompts like “ensure onboarding works” into reliable browser tests, with optional AI (BYOK). It’s local-first, privacy-focused, and built with insights from our Coursera days—where testing hundreds of features across thousands of pages was a nightmare.
Just write a test for it (kobzol.github.io)
This is a short appreciation post about Rust continuously guiding me towards doing The Right Thing™.
Testing Without Mocks: A Pattern Language (2023) (jamesshore.com)
Dead Simple Snapshot Testing in Zig (kristoff.it)
I’ve recently added snapshot testing support to Zine, my static site generator, and I’ll share here how to get a similar setup going for your projects.
Verification-First Development (buttondown.com)
A while back I argued on the Blue Site1 that "test-first development" (TFD) was different than "test-driven development" (TDD). The former is "write tests before you write code", the latter is a paradigm, culture, and collection of norms that's based on TFD. More broadly, TFD is a special case of Verification-First Development and TDD is not.
Testing Begins for Community Notes on Facebook, Instagram and Threads (about.fb.com)
In January, Meta announced that we will end our third party fact checking program and move to a crowd-sourced Community Notes approach, starting in the United States. On March 18th, we will begin testing this new approach by allowing contributors from our community to write and rate notes on content across Facebook, Instagram and Threads.
Show HN: Testeranto – the AI driven test framework for TypeScript projects (npmjs.com)
🚧 WARNING: Testeranto is still under development and is not ready for production yet. 🚧
Problems with New California Bar Exam Enrage Test Takers and Cloud Their Futures (nytimes.com)
Even under normal circumstances, the California bar exam is one final harrowing hurdle before aspiring lawyers can practice. But last week was worse than any other, as they were thrown into limbo by technical glitches, delays and what many said were bizarrely written questions on a revamped test that didn’t match anything in preparation.
Demystifying monads in Rust through property-based testing (sunshowers.io)
In programming pedagogy, monads have a place as a mystical object from the functional programming world that’s hard to understand and even harder to explain.
Ask HN: Have you written or consumed OpenAPI Arazzo specification? (ycombinator.com)
Hi HN,<p>I recently came across the Arazzo specification within the OpenAPI initiative. It defines a standard to express workflows involving multiple APIs. A couple of main use cases here are automatic API invocation with LLMs and automated tests.<p>I want to auto generate API tests based on these specs and would like to see other examples. Is anyone here writing or consuming Arazzo specs?<p>Reference: https://www.openapis.org/arazzo
Mere weeks after Starship's breakup, the vehicle may soon fly again (arstechnica.com)
A little over a month after SpaceX's large Starship launch ended in an explosion over several Caribbean islands, the company is preparing its next rocket for a test flight.
Launch HN: Confident AI (YC W25) – Open-source evaluation framework for LLM apps (ycombinator.com)
Hi HN - we're Jeffrey and Kritin, and we're building Confident AI (https://confident-ai.com). This is the cloud platform for DeepEval (https://github.com/confident-ai/deepeval), our open-source package that helps engineers evaluate and unit-test LLM applications. Think Pytest for LLMs.
Demystifying monads in Rust through property-based testing (sunshowers.io)
In programming pedagogy, monads have a place as a mystical object from the functional programming world that’s hard to understand and even harder to explain.
Show HN: OpenAstra – Chat based open-source alternative to Postman (github.com/srikanth235)
A chat-based open source development platform for API discovery and testing.
Launch HN: Roark (YC W25) – Taking the pain out of voice AI testing (ycombinator.com)
Hey HN, we’re James and Daniel, co-founders of Roark (https://roark.ai). We built a tool that lets developers replay real production calls against their latest Voice AI changes, so they can catch failures, test updates, and iterate with confidence.
Flea-Scope: $18 Source Available USB Oscilloscope, Logic Analyzer and More [pdf] (rtestardi.github.io)
The Big TDD Misunderstanding (2022) (linkedrecords.com)
Rumors have it that the term “unit” in “unit test” originally referred to the test itself, not to a unit of the system under test. The idea was that the test could be executed as one unit and does not rely on other tests running upfront (see here and here). Another contradictive perspetive is this one: “The unit to be tested is the entire point of confusion and debate.
Auto-Rewind for Daily Test (Apache NuttX RTOS) (lupyuen.org)
If the Daily Test fails for Apache NuttX RTOS … Can we Auto-Rewind and discover the Breaking Commit? Let’s try this (pic above)
Show HN: CodeCapy – A PR bot that tests your code (github.com/Scrapybara)
CodeCapy automatically detects new PRs, generates natural language end-to-end UI tests based on code changes, executes tests in isolated Scrapybara instances, posts test results to PR comments, and more.
Yoodio generative radio stations app – looking for testers (youtube.com)
Automating Git Bisect with Ephemeral Environments (qckfx.com)
Composable SQL (borretti.me)
SQL could be improved somewhat by introducing composable query fragments with statically-typed interfaces. I begin by explaining two areas (testing and reusing business logic) where SQL does very poorly. Then I explain my solution, and how it addresses the problems.
Tool touted as 'first AI software engineer' is bad at its job, testers claim (theregister.com)
A service described as "the first AI software engineer" appears to be rather bad at its job, based on a recent evaluation.
Embedding Python in Rust (For Tests) (enterprisedb.com)
The latest generation of programming languages (Rust, Go, Zig) come bundled not just with a standard library but with a suite of first-party tools for working with the code itself (e.g. cargo fmt, gofmt, zig fmt, etc.). But I suspect that some future generation of (statically typed) programming languages will also come with a first-party embedded scripting language to make it easier to write tests. Until then though, third-party embedded scripting languages can be convenient.
Show HN: Fixa – an open source Python package for testing voice agents (github.com/fixadev)
fixa is a python package for testing and evaluating AI voice agents.
Show HN: Pytest-evals – Simple LLM apps evaluation using pytest (github.com/AlmogBaku)
Test your LLM outputs against examples - no more manual checking! A (minimalistic) pytest plugin that helps you to evaluate that your LLM is giving good answers.
Guided by the beauty of our test suite (mattkeeter.com)
This is a story about making software better.
IQ is largely a pseudoscientific swindle (2019) (medium.com)
“IQ” is a stale test meant to measure mental capacity but in fact mostly measures extreme unintelligence (learning difficulties), as well as, to a lesser extent (with a lot of noise), a form of intelligence, stripped of 2nd order effects — how good someone is at taking some type of exams designed by unsophisticated nerds.