Hacker News with Generative AI: AI

Show HN: Arrakis – Open-source, self-hostable sandboxing service for AI Agents (github.com/abshkbh)
AI agents can generate malicious or buggy code that can attack the host system its run on.
Wikipedia is struggling with voracious AI bot crawlers (engadget.com)
Wikimedia has seen a 50 percent increase in bandwidth used for downloading multimedia content since January 2024, the foundation said in an update. But it's not because human readers have suddenly developed a voracious appetite for consuming Wikipedia articles and for watching videos or downloading files from Wikimedia Commons. No, the spike in usage came from AI crawlers, or automated programs scraping Wikimedia's openly licensed images, videos, articles and other files to train generative artificial intelligence models.
We need a better term for GenAI output – "slop" is too benign (rockpapershotgun.com)
Earlier this month, Snail Games put out a widely and justifiably clowned-on genAI trailer for Ark: Survival Evolved's Aquatica DLC.
Kai Scheduler: Kubernetes Native scheduler for AI workloads at large scale (github.com/NVIDIA)
KAI Scheduler is a robust, efficient, and scalable Kubernetes scheduler that optimizes GPU resource allocation for AI and machine learning workloads.
Show HN: VaporVibe – auto-generate video demos for vibe-coded projects (influme.ai)
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad (arxiv.org)
Recent math benchmarks for large language models (LLMs) such as MathArena indicate that state-of-the-art reasoning models achieve impressive performance on mathematical competitions like AIME, with the leading model, o3-mini, achieving scores comparable to top human competitors.
Text to Bark from ElevenLabs [video] (youtube.com)
Why MCP Is Mostly Bullshit (lycee.ai)
If you follow the AI space closely, you’ve surely noticed the increased interest in MCP (Model Context Protocol).
How AI is creating a rift at McKinsey, Bain, and BCG (the-ken.com)
Clients’ increasing access to AI tools is transforming the way consulting firms operate
Show HN: Qwen-2.5-32B is now the best open source OCR model (github.com/getomni-ai)
A benchmarking tool that compares OCR and data extraction capabilities of different large multimodal models such as gpt-4o, evaluating both text and json extraction accuracy. The goal of this benchmark is to publish a comprehensive benchmark of OCR accuracy across traditional OCR providers and multimodal Language Models. The evaluation dataset and methodologies are all Open Source, and we encourage expanding this benchmark to encompass any additional providers.
Dual RTX 5090 Beats $25,000 H100 in Real-World LLM Performance (hardware-corner.net)
AI enthusiasts looking for top-tier performance in local LLMs have long considered NVIDIA’s H100 to be the gold standard for inference, thanks to its high-bandwidth HBM3 memory and optimized tensor cores. However, recent benchmarks show that a dual RTX 5090 setup, while still pricey, outperforms the H100 in sustained output token generation, making it an ideal choice for those seeking the best possible performance for home use, especially for models up to 70B parameters.
NaNoWriMo shut down after AI, content moderation scandals (techcrunch.com)
NaNoWriMo, a 25-year-old online writing community-turned-nonprofit, announced on Monday evening that it is shutting down.
LLM providers on the cusp of an 'extinction' phase as capex realities bite (theregister.com)
Gartner says the market for large language model (LLM) providers is on the cusp of an extinction phase as it grapples with the capital-intensive costs of building products in a competitive market.
OpenAI plans to release a new 'open' AI language model in the coming months (techcrunch.com)
OpenAI says that it intends to release its first “open” language model since GPT‑2 “in the coming months.”
Show HN: Neuronpedia, an open source platform for AI interpretability (neuronpedia.org)
Neuronpedia is an open source interpretability platform.
Ask HN: What's the best way to get started with LLM-assisted programing? (ycombinator.com)
Currently, I use Perplexity or ChatGPT via the web prompt for small coding tasks, but sometimes I'll use Ollama. Stuff like writing a shell script perform some task, or maybe a small Python function. I'd like to get to the next level, but I don't know where to start.
Aim: Supercharged open-source experiment tracker (github.com/aimhubio)
Aim logs your training runs and any AI Metadata, enables a beautiful UI to compare, observe them and an API to query them programmatically.
Launch HN: Augento (YC W25) – Fine-tune your agents with reinforcement learning (ycombinator.com)
Hi HN, we’re the cofounders of Augento (https://augento.ai/). We’re building Deepseek R1-like fine-tuning as a service. You connect your agent, tell us when it’s right or wrong, and we deliver an LLM optimized for that agent.
DeepSeek surpasses ChatGPT in new monthly visits (indiatimes.com)
There is no Vibe Engineering (serce.me)
You've probably heard about "vibe coding" by now. The term was recently coined by Andrej Karpathy in his tweet. Andrej defines Vibe Coding as "a new kind of coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists". The key difference between vibe coding and normal coding is that the engineer doesn’t interact with the codebase directly, and instead converses with the agent and inspects the final outcome.
Show HN: AI-powered reading companion that helps you read hard books (collabai.live)
Show HN: I built an open-source NotebookLM alternative using Morphik (github.com/morphik-org)
Morphik is an open-source database designed for AI applications that simplifies working with unstructured data. It provides advanced RAG (Retrieval Augmented Generation) capabilities with multi-modal support, knowledge graphs, and intuitive APIs.
There's too much content, so I built an AI knowledge assistant (faraazahmad.github.io)
Have you ever opened a new tab for an article that seemed really interesting but you just don’t have the time to read it right now? But then your day ends, and you just don’t have it in you to go through a whole article, so you leave it open.
Unsupervised Learning of Browser Agents via Environment Interaction in the Wild (arxiv.org)
We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agents.
Karpathy's 'Vibe Coding' Movement Considered Harmful (nmn.gl)
Last Tuesday at 1 AM, I was debugging a critical production issue in my AI dev tool. As I dug through layers of functions, I suddenly realized — unlike the new generation of developers, I was grateful that I could actually understand my codebase. That’s when I started thinking more about Karpathy’s recent statements on vibe coding.
Samsung Galaxy AI features can be set to on-device-only processing (tomsguide.com)
Show HN: Cloud-Ready Postgres MCP Server (github.com/stuzero)
A Model Context Protocol (MCP) server for PostgreSQL databases with enhanced capabilities for AI agents.
Breaking up with vibe coding (lucasaguiar.xyz)
We’ve all been there: headphones on, music pumping, fingers flying across the keyboard, lost in the “flow” with your favorite AI agent. This, my friends, is vibe coding. It’s when you’re in the zone, seemingly effortlessly producing code.
I Built an LLM Framework in Just 100 Lines – Here Is Why (zacharyhuang.substack.com)
Have you ever stared at a complex AI framework and wondered, “Does it really need to be this complicated?” After a year of struggling with bloated frameworks, I decided to strip away anything unnecessary. The result is Pocket Flow, a minimalist LLM framework in just 100 lines of code.
Show HN: Job Application Bot by Ollama AI (github.com/lookr-fyi)
JobHuntr.fyi is a native macOS desktop app that uses Ollama-powered AI to apply for jobs on LinkedIn—automatically, 24/7. No OpenAI API key required.