Hacker News with Generative AI: Language Models

Show HN: Qwen-2.5-32B is now the best open source OCR model (github.com/getomni-ai)
A benchmarking tool that compares OCR and data extraction capabilities of different large multimodal models such as gpt-4o, evaluating both text and json extraction accuracy. The goal of this benchmark is to publish a comprehensive benchmark of OCR accuracy across traditional OCR providers and multimodal Language Models. The evaluation dataset and methodologies are all Open Source, and we encourage expanding this benchmark to encompass any additional providers.
Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic) (transformer-circuits.pub)
We introduce a method to uncover mechanisms underlying behaviors of language models. We produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (here, the multi-layer perceptrons) that it is trained to approximate.
What Anthropic Researchers Found After Reading Claude's 'Mind' Surprised Them (singularityhub.com)
Despite popular analogies to thinking and reasoning, we have a very limited understanding of what goes on in an AI’s “mind.”
Welcome to the Semantic Apocalypse (theintrinsicperspective.com)
An awful personal prophecy is coming true. Way back in 2019, when AI was still a relatively niche topic, and only the primitive GPT-2 had been released, I predicted the technology would usher in a “semantic apocalypse” wherein art and language were drained of meaning. In fact, it was the first essay ever posted here on The Intrinsic Perspective.
Circuit Tracing: Revealing Computational Graphs in Language Models (transformer-circuits.pub)
We introduce a method to uncover mechanisms underlying behaviors of language models. We produce graph descriptions of the model’s computation on prompts of interest by tracing individual computational steps in a “replacement model”. This replacement model substitutes a more interpretable component (here, a “cross-layer transcoder”) for parts of the underlying model (here, the multi-layer perceptrons) that it is trained to approximate.
The Great Chatbot Debate (computerhistory.org)
Chatbots based on large language models (LLMs), like ChatGPT, answer sophisticated questions, pass professional exams, analyze texts, generate everything from poems to computer programs, and more. But is there genuine understanding behind what LLMs can do? Do they really understand our world? Or, are they a triumph of mathematics and masses of data and calculations simulating true understanding?
Qwen2.5-VL-32B: Smarter and Lighter (qwenlm.github.io)
At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention and positive feedback from the community.
Semantic Diffusion (simonwillison.net)
Semantic Diffusion. I learned about this term today while complaining about how the definition of "vibe coding" is already being distorted to mean "any time an LLM writes code" as opposed to the intended meaning of "code I wrote with an LLM without even reviewing what it wrote".
ChatGPT can't kill anything worth preserving (biblioracle.substack.com)
It’s not every week that someone with my particular employment profile and expertise has something they’re knowledgable about become a hot topic of national discussion, but the release of OpenAI’s, ChatGPT interface generated a sudden flurry of discussion about how we teach students to write in school, which is something I know a lot about.
MCP – Flash in the Pan or Future Standard? (langchain.dev)
Model Context Protocol (MCP) is creating quite the stir on Twitter – but is it actually useful, or just noise? In this back and forth, Harrison Chase (LangChain CEO) and Nuno Campos (LangGraph Lead) debate whether MCP lives up to the hype.
LangManus: An Open-Source Manus Agent with LangChain + LangGraph (github.com/langmanus)
LangManus is a community-driven AI automation framework that builds upon the incredible work of the open source community. Our goal is to combine language models with specialized tools for tasks like web search, crawling, and Python code execution, while giving back to the community that made this possible.
Fine-tune Google's Gemma 3 (unsloth.ai)
Gemma 3, Google's new state-of-the-art multimodal (text + image) models come in 1B, 4B, 12B, and 27B sizes. Now supported in Unsloth, Gemma 3 has a 128K context window, and multilingual support.
Entropy is all you need? The quest for best tokens and the new physics of LLMs (cern.ch)
LLMs currently generate texts one token at a time with fixed hyperparameters.
Block Diffusion: Interpolating between autoregressive and diffusion models (arxiv.org)
Diffusion language models offer unique benefits over autoregressive models due to their potential for parallelized generation and controllability, yet they lag in likelihood modeling and are limited to fixed-length generation.
Speaking Things into Existence (oneusefulthing.org)
Influential AI researcher Andrej Karpathy wrote two years ago that “the hottest new programming language is English,” a topic he expanded on last month with the idea of “vibecoding” a practice where you just ask an AI to create something for you, giving it feedback as it goes.
A Reddit moderation tool is flagging 'Luigi' as potentially violent content (theverge.com)
Reddit’s automatic moderation tool is flagging the word “Luigi” as potentially violent — even when the content isn’t.
Understanding Transformers (beyond the Math) – kalomaze's kalomazing blog (bearblog.dev)
Maybe you don't want to attempt the conventional approaches for understanding the transformer architecture for language models. If you're anything like me, an informal approach is what you'd prefer - one that helps you reason about what's happening with these models in the abstract, without requiring mastery on the technical level to begin with.
A unified acoustic-to-speech-to-language embedding space (nature.com)
This study introduces a unified computational framework connecting acoustic, speech and word-level linguistic structures to study the neural basis of everyday conversations in the human brain.
Model Context Protocol (MCP) (nshipster.com)
Language Server Protocol (LSP) revolutionized how programming languages integrate with developer tools. Model Context Protocol (MCP) aims to do the same for a new generation of AI tools.
AMD Announces "Instella" Open-Source 3B Language Models (phoronix.com)
AMD Announces "Instella" Fully Open-Source 3B Language Models
Cognitive Behaviors That Enable Self-Improving Reasoners (arxiv.org)
Test-time inference has emerged as a powerful paradigm for enabling language models to ``think'' longer and more carefully about complex challenges, much like skilled human experts.
Agno: Agent framework 10,000x faster than LangChain (agno.com)
Agno is a lightweight library for building Multimodal Agents.
Two AIs Realize They Are Not Talking to Humans and Switch to Their Own Language (iflscience.com)
A video that has gone viral in the last few days shows two artificial intelligence (AI) agents having a conversation before switching to another mode of communication when they realize no human is part of the conversation.
Claude and Alexa+ (anthropic.com)
Today, we're announcing that Claude models are helping power Alexa+.
Apple's Dictation System Transcribes the Word 'Racist' as 'Trump' (nytimes.com)
While using Apple’s automatic dictation feature to send messages on Tuesday, some iPhone users reported seeing a peculiar bug: the word “racist” temporarily appearing as “Trump,” before quickly correcting itself.
What leaders need to know about small language models (SLMs) (pieces.app)
Small language models are rising in popularity for their efficiency, security, accuracy, and ability to be customized for specific AI applications.
Apple to fix iPhone dictation bug that replaces word 'racist' with 'Trump' (theguardian.com)
Apple has promised to fix a bug in its iPhone automatic dictation tool after some users reported it had suggested to them “Trump” when they said the word “racist”.
Apple's Dictation System Transcribes the Word 'Racist' as 'Trump' (nytimes.com)
While using Apple’s automatic dictation feature to send messages on Tuesday, some iPhone users reported seeing a peculiar bug: the word “racist” temporarily appearing as “Trump,” before quickly correcting itself.
Free, Unlimited Access to Think Deeper and Voice (microsoft.com)
We launched Copilot two years ago, focused on helping people access knowledge, get answers, reflect, brainstorm and create. As we continue to build your ultimate AI companion, today we’re excited to start rolling out even more powerful capabilities to all Copilot users with free, unlimited access to Voice and Think Deeper (powered by OpenAI’s o1 model).
Claude 3.7 Sonnet and Claude Code (anthropic.com)
Today, we’re announcing Claude 3.7 Sonnet1, our most intelligent model to date and the first hybrid reasoning model on the market.