Hacker News with Generative AI: AI Models

TinyChat15M: 15M param conversational model designed to run with 60 MB RAM (github.com/starhopp3r)
TinyChat15M is a 15-million parameter conversational language model built on the Meta Llama 2 architecture.
Gemini Advanced is free for college students through finals 2026 (gemini.google)
Students get free access to our best AI model with Gemini Advanced. Prep for your exams, perfect your writing, and tackle your homework with the best of Google AI: Gemini Advanced, NotebookLM Plus, Whisk. Plus free 2TB of storage. Available in the US only. Sign up by June 30, 2025.
Docker Model Runner (docker.com)
Generative AI is transforming software development, but building and running AI models locally is still harder than it should be.
Show HN: Comparing product rankings by OpenAI, Anthropic, and Perplexity (productrank.ai)
Meta got caught gaming LMArena (theverge.com)
With Llama 4, Meta fudged benchmarks to appear as though its new AI model is better than the competition.
Amazon introduces Nova Chat (aboutamazon.com)
Amazon makes it easier for developers and tech enthusiasts to explore Amazon Nova, its advanced Gen AI models
DeepSeek releases their latest DeepSeek v3 model, now featuring an MIT license (simonwillison.net)
Chinese AI lab DeepSeek just released the latest version of their enormous DeepSeek v3 model, baking the release date into the name DeepSeek-V3-0324.
Google calls Gemma 3 the most powerful AI model you can run on one GPU (theverge.com)
A little over a year after releasing two “open” Gemma AI models built from the same technology behind its Gemini AI, Google is updating the family with Gemma 3.
voyage-3-large (voyageai.com)
TL;DR – Introducing voyage-3-large, a new state-of-the-art general-purpose and multilingual embedding model that ranks first across eight evaluated domains spanning 100 datasets, including law, finance, and code. It outperforms OpenAI-v3-large and Cohere-v3-English by an average of 9.74% and 20.71%, respectively. Enabled by Matryoshka learning and quantization-aware training, voyage-3-large supports smaller dimensions and int8 and binary quantization that dramatically reduce vectorDB costs with minimal impact on retrieval quality.
Gemma3 – The current strongest model that fits on a single GPU (ollama.com)
Gemma is a lightweight, family of models from Google built on Gemini technology. The Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices.
Evaluating Mistral OCR Against Gemini 2.0 Flash (reducto.ai)
Today, Mistral AI released a new OCR model, claiming to be state-of-the-art (SOTA) on unreleased benchmarks. We decided to put the model to the test.
Amazon says that Alexa+ is 'model agnostic' (techcrunch.com)
Amazon says that the new and improved Alexa unveiled on Wednesday, Alexa+, is powered by a “model agnostic” system that’s always using the “best” AI model for any given task.
Claude 3.7 Sonnet and Claude Code (anthropic.com)
Today, we’re announcing Claude 3.7 Sonnet1, our most intelligent model to date and the first hybrid reasoning model on the market.
How to Run DeepSeek R1 Distilled Reasoning Models on RyzenAI and Radeon GPUs (guru3d.com)
DeepSeek R1: Don't Put All Your Eggs in One LLM Basket (notdiamond.ai)
Over the last week, the world has been on fire because of Deepseek’s new R1 reasoning model. But the stock predictions surrounding R1 don’t matter, and neither do the conspiracy theories. Even the model itself—while impressive—doesn’t really matter. The reason DeepSeek R1 really matters is because it means the number of frontier AI models is about to explode.
Sky-T1: Train your own O1 preview model within $450 (novasky-ai.github.io)
We introduce Sky-T1-32B-Preview, our reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks. Remarkably, Sky-T1-32B-Preview was trained for less than $450, demonstrating that it is possible to replicate high-level reasoning capabilities affordably and efficiently. All code is open-source.
Notes on the New Deepseek v3 (composio.dev)
Deepseek released their flagship model, v3, a 607B mixture-of-experts model with 37B active parameters. Currently, it is the best open-source model, beating Llama 3.1 405b, Qwen, and Mistral. It is on par with OpenAI GPT-4o and Claude 3.5 Sonnet from the benchmarks. The first model performs on par and better at some tasks than the big closed models.
We fine-tuned Llama and got 4.2x Sonnet 3.5 accuracy for code generation (finecodex.com)
You don't need to pay for OpenAI - Gemini 2.0 Free & More (github.com/EliasPereirah)
Orion is a web-based chat interface that simplifies interactions with multiple AI model providers.
Google releases its own 'reasoning' AI model (techcrunch.com)
Google has released what it’s calling a new “reasoning” AI model — but it’s in the experimental stages, and from our brief testing, there’s certainly room for improvement.
Show HN: Anthropic's MCP Server Directory (glama.ai)
Model Context Protocol (MCP) is an open protocol that enables AI models to interact with local and remote resources through standardized server implementations.
Gemini 2.0: our new AI model for the agentic era (google)
Google DeepMind introduces Gemini 2.0, a new AI model designed for the "agentic era."
32k context length text embedding models (voyageai.com)
TL;DR – We are excited to announce voyage-3 and voyage-3-lite embedding models, advancing the frontier of retrieval quality, latency, and cost. voyage-3 outperforms OpenAI v3 large by 7.55% on average across all evaluated domains, including code, law, finance, multilingual, and long-context, with 2.2x lower costs and 3x smaller embedding dimension, resulting in 3x lower vectorDB costs. voyage-3-lite offers 3.82% better retrieval accuracy than OpenAI v3 large while costing 6x less and having 6x smaller embedding dimension.
Omnivision-968M: Vision Language Model with 9x Tokens Reduction for Edge Devices (nexa.ai)
Show HN: A real time AI video agent with under 1 second of latency (ycombinator.com)
Hey it’s Hassaan & Quinn – co-founders of Tavus, an AI research company and developer platform for video APIs. We’ve been building AI video models for ‘digital twins’ or ‘avatars’ since 2020.
Llama can now see and run on your device – welcome Llama 3.2 (huggingface.co)
Llama 3.2 is out! Today we welcome the next iteration of the Llama collection to Hugging Face. This time, we’re excited to collaborate with Meta on the release of multimodal and small models. Ten open-weight models (5 multimodal models and 5 text-only ones) are available on the Hub.
Two new Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more (googleblog.com)
Pixtral 12B (mistral.ai)
Pixtral 12B - the first-ever multimodal Mistral model. Apache 2.0.
Smaller Gemini 1.5 Flash-8B, stronger Gemini 1.5 Pro, improved Gemini 1.5 Flash (twitter.com)
Gemini Pro 1.5 experimental "version 0801" available for early testing (deepmind.google)