Terminal-Bench: a benchmark for AI agents in terminal environments (tbench.ai)
terminal-bench is a collection of tasks and an evaluation harness to help agent makers quantify their agents' terminal mastery.