Hacker News with Generative AI: AI Models

Amazon introduces Nova Chat (aboutamazon.com)
Amazon makes it easier for developers and tech enthusiasts to explore Amazon Nova, its advanced Gen AI models
DeepSeek releases their latest DeepSeek v3 model, now featuring an MIT license (simonwillison.net)
Chinese AI lab DeepSeek just released the latest version of their enormous DeepSeek v3 model, baking the release date into the name DeepSeek-V3-0324.
Google calls Gemma 3 the most powerful AI model you can run on one GPU (theverge.com)
A little over a year after releasing two “open” Gemma AI models built from the same technology behind its Gemini AI, Google is updating the family with Gemma 3.
voyage-3-large (voyageai.com)
TL;DR – Introducing voyage-3-large, a new state-of-the-art general-purpose and multilingual embedding model that ranks first across eight evaluated domains spanning 100 datasets, including law, finance, and code. It outperforms OpenAI-v3-large and Cohere-v3-English by an average of 9.74% and 20.71%, respectively. Enabled by Matryoshka learning and quantization-aware training, voyage-3-large supports smaller dimensions and int8 and binary quantization that dramatically reduce vectorDB costs with minimal impact on retrieval quality.
Gemma3 – The current strongest model that fits on a single GPU (ollama.com)
Gemma is a lightweight, family of models from Google built on Gemini technology. The Gemma 3 models are multimodal—processing text and images—and feature a 128K context window with support for over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices.
Evaluating Mistral OCR Against Gemini 2.0 Flash (reducto.ai)
Today, Mistral AI released a new OCR model, claiming to be state-of-the-art (SOTA) on unreleased benchmarks. We decided to put the model to the test.
Amazon says that Alexa+ is 'model agnostic' (techcrunch.com)
Amazon says that the new and improved Alexa unveiled on Wednesday, Alexa+, is powered by a “model agnostic” system that’s always using the “best” AI model for any given task.
Claude 3.7 Sonnet and Claude Code (anthropic.com)
Today, we’re announcing Claude 3.7 Sonnet1, our most intelligent model to date and the first hybrid reasoning model on the market.
How to Run DeepSeek R1 Distilled Reasoning Models on RyzenAI and Radeon GPUs (guru3d.com)
DeepSeek R1: Don't Put All Your Eggs in One LLM Basket (notdiamond.ai)
Over the last week, the world has been on fire because of Deepseek’s new R1 reasoning model. But the stock predictions surrounding R1 don’t matter, and neither do the conspiracy theories. Even the model itself—while impressive—doesn’t really matter. The reason DeepSeek R1 really matters is because it means the number of frontier AI models is about to explode.
Sky-T1: Train your own O1 preview model within $450 (novasky-ai.github.io)
We introduce Sky-T1-32B-Preview, our reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks. Remarkably, Sky-T1-32B-Preview was trained for less than $450, demonstrating that it is possible to replicate high-level reasoning capabilities affordably and efficiently. All code is open-source.
Notes on the New Deepseek v3 (composio.dev)
Deepseek released their flagship model, v3, a 607B mixture-of-experts model with 37B active parameters. Currently, it is the best open-source model, beating Llama 3.1 405b, Qwen, and Mistral. It is on par with OpenAI GPT-4o and Claude 3.5 Sonnet from the benchmarks. The first model performs on par and better at some tasks than the big closed models.
We fine-tuned Llama and got 4.2x Sonnet 3.5 accuracy for code generation (finecodex.com)
You don't need to pay for OpenAI - Gemini 2.0 Free & More (github.com/EliasPereirah)
Orion is a web-based chat interface that simplifies interactions with multiple AI model providers.
Google releases its own 'reasoning' AI model (techcrunch.com)
Google has released what it’s calling a new “reasoning” AI model — but it’s in the experimental stages, and from our brief testing, there’s certainly room for improvement.
Show HN: Anthropic's MCP Server Directory (glama.ai)
Model Context Protocol (MCP) is an open protocol that enables AI models to interact with local and remote resources through standardized server implementations.
Gemini 2.0: our new AI model for the agentic era (google)
Google DeepMind introduces Gemini 2.0, a new AI model designed for the "agentic era."
32k context length text embedding models (voyageai.com)
TL;DR – We are excited to announce voyage-3 and voyage-3-lite embedding models, advancing the frontier of retrieval quality, latency, and cost. voyage-3 outperforms OpenAI v3 large by 7.55% on average across all evaluated domains, including code, law, finance, multilingual, and long-context, with 2.2x lower costs and 3x smaller embedding dimension, resulting in 3x lower vectorDB costs. voyage-3-lite offers 3.82% better retrieval accuracy than OpenAI v3 large while costing 6x less and having 6x smaller embedding dimension.
Omnivision-968M: Vision Language Model with 9x Tokens Reduction for Edge Devices (nexa.ai)
Show HN: A real time AI video agent with under 1 second of latency (ycombinator.com)
Hey it’s Hassaan & Quinn – co-founders of Tavus, an AI research company and developer platform for video APIs. We’ve been building AI video models for ‘digital twins’ or ‘avatars’ since 2020.
Llama can now see and run on your device – welcome Llama 3.2 (huggingface.co)
Llama 3.2 is out! Today we welcome the next iteration of the Llama collection to Hugging Face. This time, we’re excited to collaborate with Meta on the release of multimodal and small models. Ten open-weight models (5 multimodal models and 5 text-only ones) are available on the Hub.
Two new Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more (googleblog.com)
Pixtral 12B (mistral.ai)
Pixtral 12B - the first-ever multimodal Mistral model. Apache 2.0.
Smaller Gemini 1.5 Flash-8B, stronger Gemini 1.5 Pro, improved Gemini 1.5 Flash (twitter.com)
Gemini Pro 1.5 experimental "version 0801" available for early testing (deepmind.google)
Llama 3.1: 405B, the largest openly available model released (github.com/meta-llama)
OpenAI is releasing GPT-4o Mini, a cheaper, smarter model (theverge.com)
My finetuned models beat OpenAI's GPT-4 (mlops.systems)
OpenPipe Mixture of Agents: Outperform GPT-4 at 1/25th the Cost (openpipe.ai)
Snowflake releases a flagship generative AI model of its own (techcrunch.com)