Hacker News with Generative AI: Language Models

AMD Announces "Instella" Open-Source 3B Language Models (phoronix.com)
AMD Announces "Instella" Fully Open-Source 3B Language Models
Cognitive Behaviors That Enable Self-Improving Reasoners (arxiv.org)
Test-time inference has emerged as a powerful paradigm for enabling language models to ``think'' longer and more carefully about complex challenges, much like skilled human experts.
Agno: Agent framework 10,000x faster than LangChain (agno.com)
Agno is a lightweight library for building Multimodal Agents.
Two AIs Realize They Are Not Talking to Humans and Switch to Their Own Language (iflscience.com)
A video that has gone viral in the last few days shows two artificial intelligence (AI) agents having a conversation before switching to another mode of communication when they realize no human is part of the conversation.
Claude and Alexa+ (anthropic.com)
Today, we're announcing that Claude models are helping power Alexa+.
Apple's Dictation System Transcribes the Word 'Racist' as 'Trump' (nytimes.com)
While using Apple’s automatic dictation feature to send messages on Tuesday, some iPhone users reported seeing a peculiar bug: the word “racist” temporarily appearing as “Trump,” before quickly correcting itself.
What leaders need to know about small language models (SLMs) (pieces.app)
Small language models are rising in popularity for their efficiency, security, accuracy, and ability to be customized for specific AI applications.
Apple to fix iPhone dictation bug that replaces word 'racist' with 'Trump' (theguardian.com)
Apple has promised to fix a bug in its iPhone automatic dictation tool after some users reported it had suggested to them “Trump” when they said the word “racist”.
Apple's Dictation System Transcribes the Word 'Racist' as 'Trump' (nytimes.com)
While using Apple’s automatic dictation feature to send messages on Tuesday, some iPhone users reported seeing a peculiar bug: the word “racist” temporarily appearing as “Trump,” before quickly correcting itself.
Free, Unlimited Access to Think Deeper and Voice (microsoft.com)
We launched Copilot two years ago, focused on helping people access knowledge, get answers, reflect, brainstorm and create. As we continue to build your ultimate AI companion, today we’re excited to start rolling out even more powerful capabilities to all Copilot users with free, unlimited access to Voice and Think Deeper (powered by OpenAI’s o1 model).
Claude 3.7 Sonnet and Claude Code (anthropic.com)
Today, we’re announcing Claude 3.7 Sonnet1, our most intelligent model to date and the first hybrid reasoning model on the market.
Using LLMs effectively isn't about prompting (seangoedecke.com)
When people talk about using language models effectively, they mainly talk about prompting: sharing great prompts, lists of tips for prompting, or courses on becoming a “prompt engineer”. It’s true that prompting is a surprisingly effective way to get more out of LLMs. Small variations in prompts can make a big difference in the LLM output. There really are general rules (put your question first and your context last, for instance)1. However, when I use LLMs, I rarely think about prompting.
Are you conscious? A conversation between Richard Dawkins and ChatGPT (richarddawkins.substack.com)
Is AI truly conscious, or just an advanced illusion of thought? Richard Dawkins shares his conversation between ChatGPT displaying the depths of machine intelligence and its success passing the Turing Test for consciousness.
OpenEuroLLM (openeurollm.eu)
Europe's leading AI companies and research institutions combine their forces and expertise to develop next-generation open-source language models in an unprecedented collaboration to advance European AI capabilities, the OpenEuroLLM project
Helix: A vision-language-action model for generalist humanoid control (figure.ai)
We're introducing Helix, a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics.
DeepSeek Native Sparse Attention (arxiv.org)
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges.
Mistral Saba (mistral.ai)
Making AI ubiquitous requires addressing every culture and language. As AI proliferates globally, many of our customers worldwide have expressed a strong desire for models that are not just fluent but native to regional parlance.
Is ChatGPT autocomplete bad UX/UI? (honzabe.com)
I get it. I am not the only user the world revolves around. When an app does not behave the way I would prefer, it’s probably because most people have different preferences, and the app is optimized for them.
A woman made her AI voice clone say "arse." Then she got banned (technologyreview.com)
People with motor neuron disease should be allowed to say whatever they want, including “arse” and “knickers.”
Surrealist Compliment Generator (madsci.org)
<h2>May your succulent earlobes ever flap about my knees like a thousand wooden pigeons fleeing the local sawmill.</h2>
PlayAI's new Dialog model achieves 3:1 preference in human evals (play.ht)
PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications requiring emotive, human-like speech.  In recent third-party benchmark tests, Dialog was preferred by 10:1 vs. ElevenLabs v2.5 Turbo, and by over 3:1 vs. ElevenLabs Multilingual v2.0.Play the video below to find out what it sounds like, or visit our AI voiceover Studio to try it for yourself.
OpenEuro LLM (openeurollm.eu)
Europe's leading AI companies and research institutions combine their forces and expertise to develop next-generation open-source language models in an unprecedented collaboration to advance European AI capabilities, the OpenEuroLLM project
OpenAI used this subreddit to test AI persuasion (techcrunch.com)
OpenAI used the subreddit, r/ChangeMyView, to create a test for measuring the persuasive abilities of its AI reasoning models.
Alibaba Qwen: AI model that writes, generates images/videos, and does web search (twitter.com)
Selene Mini: Open-sourced SOTA small language-model-as-a-judge (huggingface.co)
Atla Selene Mini is a state-of-the-art small language model-as-a-judge (SLMJ). Selene Mini achieves comparable performance to models 10x its size, outperforming GPT-4o on RewardBench, EvalBiasBench, and AutoJ.
I do not want AI to "polish" me (thebloggess.com)
I was sending an email when a little magic wand popped up that said “Polish” and I thought that was weird because why would I want to translate my email into Polish?
DeepSeek demonstrates pro-Chinese bias (medium.com)
DeepSeek is a wonderful step in the development of open AI approaches. It also has a pretty serious pro-Chinese bias. I compare the results of 3 sensitive questions (about Gaza, Xinjiang and TikTok) and on all three, the Chinese bias is pretty apparent while existing tools (ChatGPT, Gemini) are far more balanced. In two instances, it used the pronoun “we” to describe the Chinese position, which suggests lots of training data that associates “we” with the Chinese.
The DeepSeek panic reveals an AI world ready to blow (theguardian.com)
The arrival of DeepSeek R1, an AI language model built by the Chinese AI lab DeepSeek, has been nothing less than seismic.
Translation using deep neural networks (part 1) (aamster.github.io)
In this article, I’ll introduce language modeling using deep learning and will focus on the problem of translation.
OpenAI's o1 Playing Codenames (suveenellawela.com)
I got two teams of OpenAI's o1 models to play the boardgame, Codenames, and they didn't disappoint.