Hacker News with Generative AI: Testing

Launch HN: Confident AI (YC W25) – Open-source evaluation framework for LLM apps (ycombinator.com)
Hi HN - we're Jeffrey and Kritin, and we're building Confident AI (https://confident-ai.com). This is the cloud platform for DeepEval (https://github.com/confident-ai/deepeval), our open-source package that helps engineers evaluate and unit-test LLM applications. Think Pytest for LLMs.
Demystifying monads in Rust through property-based testing (sunshowers.io)
In programming pedagogy, monads have a place as a mystical object from the functional programming world that’s hard to understand and even harder to explain.
Show HN: OpenAstra – Chat based open-source alternative to Postman (github.com/srikanth235)
A chat-based open source development platform for API discovery and testing.
Launch HN: Roark (YC W25) – Taking the pain out of voice AI testing (ycombinator.com)
Hey HN, we’re James and Daniel, co-founders of Roark (https://roark.ai). We built a tool that lets developers replay real production calls against their latest Voice AI changes, so they can catch failures, test updates, and iterate with confidence.
Flea-Scope: $18 Source Available USB Oscilloscope, Logic Analyzer and More [pdf] (rtestardi.github.io)
The Big TDD Misunderstanding (2022) (linkedrecords.com)
Rumors have it that the term “unit” in “unit test” originally referred to the test itself, not to a unit of the system under test. The idea was that the test could be executed as one unit and does not rely on other tests running upfront (see here and here). Another contradictive perspetive is this one: “The unit to be tested is the entire point of confusion and debate.
Auto-Rewind for Daily Test (Apache NuttX RTOS) (lupyuen.org)
If the Daily Test fails for Apache NuttX RTOS … Can we Auto-Rewind and discover the Breaking Commit? Let’s try this (pic above)
Show HN: CodeCapy – A PR bot that tests your code (github.com/Scrapybara)
CodeCapy automatically detects new PRs, generates natural language end-to-end UI tests based on code changes, executes tests in isolated Scrapybara instances, posts test results to PR comments, and more.
Yoodio generative radio stations app – looking for testers (youtube.com)
Automating Git Bisect with Ephemeral Environments (qckfx.com)
Composable SQL (borretti.me)
SQL could be improved somewhat by introducing composable query fragments with statically-typed interfaces. I begin by explaining two areas (testing and reusing business logic) where SQL does very poorly. Then I explain my solution, and how it addresses the problems.
Tool touted as 'first AI software engineer' is bad at its job, testers claim (theregister.com)
A service described as "the first AI software engineer" appears to be rather bad at its job, based on a recent evaluation.
Embedding Python in Rust (For Tests) (enterprisedb.com)
The latest generation of programming languages (Rust, Go, Zig) come bundled not just with a standard library but with a suite of first-party tools for working with the code itself (e.g. cargo fmt, gofmt, zig fmt, etc.). But I suspect that some future generation of (statically typed) programming languages will also come with a first-party embedded scripting language to make it easier to write tests. Until then though, third-party embedded scripting languages can be convenient.
Show HN: Fixa – an open source Python package for testing voice agents (github.com/fixadev)
fixa is a python package for testing and evaluating AI voice agents.
Show HN: Pytest-evals – Simple LLM apps evaluation using pytest (github.com/AlmogBaku)
Test your LLM outputs against examples - no more manual checking! A (minimalistic) pytest plugin that helps you to evaluate that your LLM is giving good answers.
Guided by the beauty of our test suite (mattkeeter.com)
This is a story about making software better.
IQ is largely a pseudoscientific swindle (2019) (medium.com)
“IQ” is a stale test meant to measure mental capacity but in fact mostly measures extreme unintelligence (learning difficulties), as well as, to a lesser extent (with a lot of noise), a form of intelligence, stripped of 2nd order effects — how good someone is at taking some type of exams designed by unsophisticated nerds.
The testing pyramid is an outdated economic model (wiremock.io)
The testing pyramid is one of the most famous concepts in agile. But how relevant is it for developers today?
How rqlite is tested (philipotoole.com)
rqlite is a lightweight, open-source, distributed relational database built on SQLite and Raft. With its origins dating back to 2014, its design has always prioritized reliability, and quality. Testing plays a foundational role in achieving these qualities, shaping the implementation and guaranteeing robustness.
Snyk Security Labs Testing Update: Cursor.com AI Code Editor (snyk.io)
Snyk’s Security Labs team aims to find and help mitigate vulnerabilities in software used by developers around the world, with an overarching goal to improve the state of software security.
Why Rust nextest is process-per-test (sunshowers.io)
I’m often asked why the Rust test runner I maintain, cargo-nextest, runs every test in a separate process. Here’s my best attempt at explaining the rationale behind it.
Documentation is more important than tests (anonel.substack.com)
Without the intention of making a click-bait-y title, I really think docs and tests are almost equal in importance, but if a company would only have resources to do one, they should choose documentation.
Power up and tear down of a Rohde and Schwarz SKTU BN 4151/2/5 noise generator (makertube.net)
We are sorry but it seems that PeerTube is not compatible with your web browser.
Ruby-refrigerator: Freeze all core Ruby classes (github.com/jeremyevans)
Refrigerator offers an easy way to freeze all ruby core classes and modules. It’s designed to be used in production and when testing to make sure that no code is making unexpected changes to core classes or modules at runtime.
Database Release and End-to-End Testing: ClickHouse Database Cloning (notion.site)
Writing and testing a paginated API iterator in Go (thibaut-rousseau.com)
Go 1.23, amongst other features, brought Iterators to the standard library.
Webhook Tester/Debugger (hooklistener.com)
Optimize, Test, and Debug Webhooks with Precision
Testing for Thermal Issues Becomes More Difficult (semiengineering.com)
Increasingly complex and heterogeneous architectures, coupled with the adoption of high-performance materials, are making it much more difficult to identify and test for thermal issues in advanced packages.
Make your QEMU faster (2022) (schreibt.jetzt)
NixOS uses virtual machines based on QEMU extensively for running its test suite. In order to avoid generating a disk image for every test, the test driver usually boots using a Plan 9 File Protocol (9p) share (server implemented by QEMU) for the Nix store, which contains all the programs and config necessary for the test.
SeleniumBase: Python APIs for web automation and bypassing bot-detection (github.com/seleniumbase)
Python APIs for web automation, testing, and bypassing bot-detection.