Hacker News with Generative AI: Accuracy

30% drop in O1-preview accuracy when Putnam problems are slightly variated (openreview.net)
As large language models (LLMs) continue to advance, many existing benchmarks designed to evaluate their reasoning capabilities are becoming saturated.
Elektročas HH3 – the most accurate pendulum clock on the planet (cern.ch)
Can a pendulum clock be good to one second in 158 million years? A piece of cake...
Elektročas HH3: the most accurate pendulum clock on the planet (cern.ch)
Can a pendulum clock be good to one second in 158 million years? A piece of cake...
We fine-tuned Llama and got 4.2x Sonnet 3.5 accuracy for code generation (finecodex.com)
How can this 6 axis robot have a static accuracy of 0.05 mm? (2021) [video] (youtube.com)
Can the New Mathstral LLM Accurately Compare 9.11 and 9.9? (secondstate.io)
“Imprecise” language models are smaller, speedier, and nearly as accurate (ieee.org)
Improving GPT 4 Function Calling Accuracy (composio.dev)