Terminal-Bench: a benchmark for AI agents in terminal environments
(tbench.ai)
terminal-bench is a collection of tasks and an evaluation harness to help agent makers quantify their agents' terminal mastery.
terminal-bench is a collection of tasks and an evaluation harness to help agent makers quantify their agents' terminal mastery.