Hacker News with Generative AI: AI Models

Sky-T1: Train your own O1 preview model within $450 (novasky-ai.github.io)
We introduce Sky-T1-32B-Preview, our reasoning model that performs on par with o1-preview on popular reasoning and coding benchmarks. Remarkably, Sky-T1-32B-Preview was trained for less than $450, demonstrating that it is possible to replicate high-level reasoning capabilities affordably and efficiently. All code is open-source.
Notes on the New Deepseek v3 (composio.dev)
Deepseek released their flagship model, v3, a 607B mixture-of-experts model with 37B active parameters. Currently, it is the best open-source model, beating Llama 3.1 405b, Qwen, and Mistral. It is on par with OpenAI GPT-4o and Claude 3.5 Sonnet from the benchmarks. The first model performs on par and better at some tasks than the big closed models.
We fine-tuned Llama and got 4.2x Sonnet 3.5 accuracy for code generation (finecodex.com)
You don't need to pay for OpenAI - Gemini 2.0 Free & More (github.com/EliasPereirah)
Orion is a web-based chat interface that simplifies interactions with multiple AI model providers.
Google releases its own 'reasoning' AI model (techcrunch.com)
Google has released what it’s calling a new “reasoning” AI model — but it’s in the experimental stages, and from our brief testing, there’s certainly room for improvement.
Show HN: Anthropic's MCP Server Directory (glama.ai)
Model Context Protocol (MCP) is an open protocol that enables AI models to interact with local and remote resources through standardized server implementations.
Gemini 2.0: our new AI model for the agentic era (google)
Google DeepMind introduces Gemini 2.0, a new AI model designed for the "agentic era."
32k context length text embedding models (voyageai.com)
TL;DR – We are excited to announce voyage-3 and voyage-3-lite embedding models, advancing the frontier of retrieval quality, latency, and cost. voyage-3 outperforms OpenAI v3 large by 7.55% on average across all evaluated domains, including code, law, finance, multilingual, and long-context, with 2.2x lower costs and 3x smaller embedding dimension, resulting in 3x lower vectorDB costs. voyage-3-lite offers 3.82% better retrieval accuracy than OpenAI v3 large while costing 6x less and having 6x smaller embedding dimension.
Omnivision-968M: Vision Language Model with 9x Tokens Reduction for Edge Devices (nexa.ai)
Show HN: A real time AI video agent with under 1 second of latency (ycombinator.com)
Hey it’s Hassaan & Quinn – co-founders of Tavus, an AI research company and developer platform for video APIs. We’ve been building AI video models for ‘digital twins’ or ‘avatars’ since 2020.
Llama can now see and run on your device – welcome Llama 3.2 (huggingface.co)
Llama 3.2 is out! Today we welcome the next iteration of the Llama collection to Hugging Face. This time, we’re excited to collaborate with Meta on the release of multimodal and small models. Ten open-weight models (5 multimodal models and 5 text-only ones) are available on the Hub.
Two new Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more (googleblog.com)
Pixtral 12B (mistral.ai)
Pixtral 12B - the first-ever multimodal Mistral model. Apache 2.0.
Smaller Gemini 1.5 Flash-8B, stronger Gemini 1.5 Pro, improved Gemini 1.5 Flash (twitter.com)
Gemini Pro 1.5 experimental "version 0801" available for early testing (deepmind.google)
Llama 3.1: 405B, the largest openly available model released (github.com/meta-llama)
OpenAI is releasing GPT-4o Mini, a cheaper, smarter model (theverge.com)
My finetuned models beat OpenAI's GPT-4 (mlops.systems)
OpenPipe Mixture of Agents: Outperform GPT-4 at 1/25th the Cost (openpipe.ai)
Snowflake releases a flagship generative AI model of its own (techcrunch.com)
Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat (huggingface.co)