Hacker News with Generative AI: Generative AI

AI Coding Assistant Is Gaslighting You – The Hidden Cost of Uncertainty (jc.tt)
The recent Devin review by Answer.AI highlights a critical but often overlooked aspect of AI coding assistants: their maddening unpredictability.
She Is in Love with ChatGPT (nytimes.com)
Rumor About GPT-5 Changes Everything (thealgorithmicbridge.com)
What if I told you that GPT-5 is real. Not just real, but already shaping the world, from where you can’t see it.
She Is in Love with ChatGPT (nytimes.com)
Replit CEO on AI breakthroughs: We don't care about professional coders anymore (semafor.com)
Replit has had a turbulent year, but CEO Amjad Masad’s sonorous voice was almost zen-like as he spoke to me on Monday in an airy conference room, sipping coconut water with a view of the sun setting over Foster City, California.
Adobe's new AI tool can edit 10k images in one click (theverge.com)
Adobe is launching new generative AI tools that can automate labor-intensive production tasks like editing large batches of images and translating video presentations.
MatterGen: A new paradigm of materials design with generative AI (microsoft.com)
Materials innovation is one of the key drivers of major technological breakthroughs.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget (github.com/SonyResearch)
This repository provides a minimalistic implementation of our approach to training large-scale diffusion models from scratch on an extremely low budget.
Amazon races to transplant Alexa's 'brain' with generative AI (ft.com)
4M Tokens Context Model (github.com/MiniMax-AI)
Gemini Advanced now included in Google Workspace (workspace.google.com)
We believe AI is foundational to the future of work and its transformative power should be accessible to every business and every employee, at an affordable price.
Generative AI is a Parasitic Cancer[video] (youtube.com)
OpenAI's AI reasoning model 'thinks' in Chinese sometimes, no one knows why (techcrunch.com)
Shortly after OpenAI released o1, its first “reasoning” AI model, people began noting a curious phenomenon. The model would sometimes begin “thinking” in Chinese, Persian, or some other language — even when asked a question in English.
Generate audiobooks from E-books with Kokoro-82M (claudio.uk)
Kokoro v0.19 is a recently published text-to-speech model with just 82M params and very high-quality output.
Transformer^2: Self-Adaptive LLMs (sakana.ai)
Adaptation is one of the most remarkable phenomena in nature. From the way an octopus can change their skin color to blend into its surroundings, to how the human brain rewires itself after an injury, allowing individuals to recover lost functions and adapt to new ways of thinking or moving. Living organisms exhibit adaptability that allows life to flourish in diverse and ever-changing environments.
Longest context up to 4M, MiniMax-01 hybrid 456B Open source model (github.com/MiniMax-AI)
MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token.
The Two Word Test as a semantic benchmark for large language models (nature.com)
Large language models (LLMs) have shown remarkable abilities recently, including passing advanced professional exams and demanding benchmark tests.
Show HN: Value likelihoods for OpenAI structured output (arena-ai.github.io)
structured-logprobs is an open-source Python library that enhances OpenAI's structured outputs by providing detailed information about token log probabilities.
Cheating Is All You Need (sourcegraph.com)
There is something legendary and historic happening in software engineering, right now as we speak, and yet most of you don’t realize at all how big it is.
Voyage-code-3 (voyageai.com)
TL;DR – Introducing voyage-code-3, our next-generation embedding model optimized for code retrieval. It outperforms OpenAI-v3-large and CodeSage-large by an average of 13.80% and 16.81% on a suite of 32 code retrieval datasets, respectively. By supporting smaller dimensions with Matryoshka learning and quantized formats like int8 and binary, voyage-code-3 can also dramatically reduce storage and search costs with minimal impact on retrieval quality.
VideoRAG: Retrieval-Augmented Generation over Video Corpus (arxiv.org)
Retrieval-Augmented Generation (RAG) is a powerful strategy to address the issue of generating factually incorrect outputs in foundation models by retrieving external knowledge relevant to queries and incorporating it into their generation process.
Sky-T1: Train your own O1 preview model within $450 (novasky-ai.github.io)
We introduce Sky-T1-32B-Preview, our reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks. Remarkably, Sky-T1-32B-Preview was trained for less than $450, demonstrating that it is possible to replicate high-level reasoning capabilities affordably and efficiently. All code is open-source.
O1 isn't a chat model (and that's the point) (latent.space)
How did I go from hating o1 to using it everyday for my most important questions?
Contemplative LLMs (bearblog.dev)
Recently, I posted a prompt on X (formerly, Twitter) for Large Language Models like Claude sonnet, GPT-4o, Deepseek v3, and so on. The prompt instructs these models to 'contemplate' for a bit before providing the final answer, and it unexpectedly went viral. This is a short blog post on my thinking behind coming up with this prompt.
Maybe ChatGPT has some pre-frontal cortex problems (solresol.substack.com)
People have been complaining that ChatGPT has been degrading with each new version. This sounds like cognitive decline! Let’s administer some tests that might detect incipent dementia.
Sky-T1: Open-Source, O1 Performance model trained under $450 (novasky-ai.github.io)
We introduce Sky-T1-32B-Preview, our reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks. Remarkably, Sky-T1-32B-Preview was trained for less than $450, demonstrating that it is possible to replicate high-level reasoning capabilities affordably and efficiently. All code is open-source.
I built an Gemini powered AI that detects and fixes Python errors with reasoning (medium.com)
In the ever-evolving landscape of software development, automated tools that can help diagnose and fix code issues are becoming increasingly valuable. Today, I’m excited to share a project that leverages Google’s Gemini model to automatically detect and fix errors in Python code. This tool not only identifies issues but also provides reasoned explanations for its fixes, making it an excellent learning resource for developers.
My AI/LLM predictions for the next 1, 3 and 6 years (simonwillison.net)
The Oxide and Friends podcast has an annual tradition of asking guests to share their predictions for the next 1, 3 and 6 years.
How we used GPT-4o for image detection with 350 similar illustrations (pages.dev)
Show HN: Freeact – A Lightweight Library for Code-Action Based Agents (github.com/gradion-ai)
freeact is a minimalistic agent library that empowers language models to act as autonomous agents through executable code actions.