Hacker News with Generative AI: Generative AI

Latent Space Guardrails That Reduce Hallucinations by 43 Percent Now Open Source (ycombinator.com)
Heyah,<p>This is Lukasz. I am running Wisent, a representation engineering company. I created guardrails that allow you to block certain patterns of LLM activations on the latent space level. It is now fully self-hosted and open source. Think stopping hallucination, harmful thoughts of the LLM or bad code generation. Let me know how it works for your use case- happy to help you generate the most value from it.<p>Check out more at https://www.wisent.ai/ or https://www.lukaszbartoszcze.com/
Yann LeCun, Pioneer of AI, Thinks Today's LLM's Are Nearly Obsolete (newsweek.com)
Ask Yann LeCun—Meta's chief AI scientist, Turing Award winner, NYU data scientist and one of the pioneers of artificial intelligence—about the future of large language models (LLMs) like OpenAI's ChatGPT, Google's Gemini, Meta's Llama and Anthropic's Claude, and his answer might startle you: He believes LLMs will be largely obsolete within five years.
Multi-Token Attention (arxiv.org)
Soft attention is a critical mechanism powering LLMs to locate relevant parts within a given context. However, individual attention weights are determined by the similarity of only a single query and key token vector. This "single token attention" bottlenecks the amount of information used in distinguishing a relevant part from the rest of the context.
What, exactly, is an 'AI Agent'? Here's a litmus test (tines.com)
AI Ambivalence (nolanlawson.com)
I’ve avoided writing this post for a long time, partly because I try to avoid controversial topics these days, and partly because I was waiting to make my mind up about the current, all-consuming, conversation-dominating topic of generative AI.
Show HN: Open Responses – Self-hosted OpenAI Responses API, works with any model (github.com/julep-ai)
Open Responses lets you run a fully self-hosted version of OpenAI's Responses API. It works seamlessly with any large language model (LLM) provider—whether it's Claude, Qwen, Deepseek R1, Ollama, or others. It's a fully-compatible drop-in replacement for the official API. Swap out OpenAI without changing your existing Agents SDK code.
Wikipedia is struggling with voracious AI bot crawlers (engadget.com)
Wikimedia has seen a 50 percent increase in bandwidth used for downloading multimedia content since January 2024, the foundation said in an update. But it's not because human readers have suddenly developed a voracious appetite for consuming Wikipedia articles and for watching videos or downloading files from Wikimedia Commons. No, the spike in usage came from AI crawlers, or automated programs scraping Wikimedia's openly licensed images, videos, articles and other files to train generative artificial intelligence models.
Grok, built by xAI, has labeled Musk as the top misinformation spreader on X (twitter.com)
Something went wrong, but don’t fret — let’s give it another shot.
UCSD: Large Language Models Pass the Turing Test (arxiv.org)
We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations.
We need a better term for GenAI output – "slop" is too benign (rockpapershotgun.com)
Earlier this month, Snail Games put out a widely and justifiably clowned-on genAI trailer for Ark: Survival Evolved's Aquatica DLC.
Text to Bark from ElevenLabs [video] (youtube.com)
LLM providers on the cusp of an 'extinction' phase as capex realities bite (theregister.com)
Gartner says the market for large language model (LLM) providers is on the cusp of an extinction phase as it grapples with the capital-intensive costs of building products in a competitive market.
OpenAI plans to release a new 'open' AI language model in the coming months (techcrunch.com)
OpenAI says that it intends to release its first “open” language model since GPT‑2 “in the coming months.”
Everything is Ghibli (carly.substack.com)
OpenAI unleashed its native image generation in ChatGPT on Tuesday. By Wednesday morning, every social feed was drowning in Studio Ghibli-style portraits. (Linkedin, check back next week.) What happened—and why—is another signal of where AI, art, and our attention are headed.
GPT-4o helped me re-create classic cartoons with myself as a character (twitter.com)
Something went wrong, but don’t fret — let’s give it another shot.
LLM Workflows then Agents: Getting Started with Apache Airflow (github.com/astronomer)
This repository contains an SDK for working with LLMs from Apache Airflow, based on Pydantic AI. It allows users to call LLMs and orchestrate agent calls directly within their Airflow pipelines using decorator-based tasks. The SDK leverages the familiar Airflow @task syntax with extensions like @task.llm, @task.llm_branch, and @task.agent.
DeepSeek surpasses ChatGPT in new monthly visits (indiatimes.com)
Mirrors: The Blind Spot of Image and Video Generation Models (medium.com)
Recent advances in image generation models have demonstrated remarkable capabilities in creating photorealistic and imaginative visuals. However, a persistent challenge remains: accurately rendering reflections in mirrors.
Runway Gen-4 (runwayml.com)
A new generation of consistent and controllable media is here.
RLHF Is Cr*P, It's a Paint Job on a Rusty Car: Geoffrey Hinton (officechai.com)
RLHF, or Reinforcement Learning from Human Feedback, is behind some of the recent advances in AI, but one of the pioneers of the field doesn’t think highly of it.
Amazon introduces Nova Chat (aboutamazon.com)
Amazon makes it easier for developers and tech enthusiasts to explore Amazon Nova, its advanced Gen AI models
Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison (composio.dev)
Google just launched Gemini 2.5 Pro on March 26th, claiming to be the best in coding, reasoning and overall everything. But I mostly care about how the model compares against the best available coding model, Claude 3.7 Sonnet (thinking), released at the end of February, which I have been using, and it has been a great experience
GPT-4o draws itself as a consistent type of guy (danielpaleka.com)
When asked to draw itself as a person, the ChatGPT Create Image feature introduced on March 25, 2025, consistently portrays itself as a white male in his 20s with brown hair, often sporting facial hair and glasses.
AI Experts Say We're on the Wrong Path to Achieving Human-Like AI (gizmodo.com)
According to a panel of hundreds of artificial intelligence researchers, the field is currently pursuing artificial general intelligence the wrong way.
What Anthropic Researchers Found After Reading Claude's 'Mind' Surprised Them (singularityhub.com)
Despite popular analogies to thinking and reasoning, we have a very limited understanding of what goes on in an AI’s “mind.”
What is Zombie Prompting: in 5 simple images (twitter.com)
Something went wrong, but don’t fret — let’s give it another shot.
Qwen2.5-Omni Technical Report (huggingface.co)
In this report, we present Qwen2.5-Omni, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.
Karpathy's 'Vibe Coding' Movement Considered Harmful (nmn.gl)
Last Tuesday at 1 AM, I was debugging a critical production issue in my AI dev tool. As I dug through layers of functions, I suddenly realized — unlike the new generation of developers, I was grateful that I could actually understand my codebase. That’s when I started thinking more about Karpathy’s recent statements on vibe coding.
Show HN: I built a tool that generates quizzes from documents using LLMs (ycombinator.com)
Hey everyone! I recently built this little side project that takes any document you upload and turns it into practice quizzes using LLMs to generate the questions: https://www.cuiz-ai.com
Ask HN: Are LLMs just answering what we want to hear? (ycombinator.com)
I keep seeing those tweets and posts where users ask ChatGPT or a similar LLM to describe them etc... and it always answers positive cool stuff which reinforces what the user wants to hear.