Hacker News with Generative AI: Machine Learning

Helix: A Vision-Language-Action Model for Generalist Humanoid Control (figure.ai)
We're introducing Helix, a generalist Vision-Language-Action (VLA) model that unifies perception, language understanding, and learned control to overcome multiple longstanding challenges in robotics.
It's time to become an ML engineer (2022) (gregbrockman.com)
AI has recently crossed a utility threshold, where cutting-edge models such as GPT-3, Codex, and DALL-E 2 are actually useful and can perform tasks computers cannot do any other way.
Run structured extraction on documents/images locally with Ollama and Pydantic (github.com/vlm-run)
Welcome to VLM Run Hub, a comprehensive repository of pre-defined Pydantic schemas for extracting structured data from unstructured visual domains such as images, videos, and documents.
I built a large language model "from scratch" (brettgfitzgerald.com)
I’m a machine learning / A.I. hobbyist. The technologies fascinate me, and I can’t seem to learn enough about them. Sebastian Raschka’s book, Build a Large Language Model (From Scratch) caught my eye. I don’t recall how I stumbled on it, but I found it when it was still in early access from Manning Publications. I purchased it, and started working through it as the final chapters were being written and released.
The Ultra-Scale Playbook: Training LLMs on GPU Clusters (huggingface.co)
Refreshing
Implementing LLaMA3 in 100 Lines of Pure Jax (saurabhalone.com)
In this post, we'll implement llama3 from scratch using pure jax in just 100 lines of code. Why jax? Because I think it has good aesthetics. Also jax looks like a NumPy wrapper but it has some cool features like xla; a linear algebra accelerator, jit, vmap, pmap etc., which makes your training go brr brr.
Native Sparse Attention: Hardware-Aligned, Natively Trainable Sparse Attention (arxiv.org)
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges.
SWE-Lancer: a benchmark of freelance software engineering tasks from Upwork (arxiv.org)
We introduce SWE-Lancer, a benchmark of over 1,400 freelance software engineering tasks from Upwork, valued at \$1 million USD total in real-world payouts.
ZeroBench: An Impossible* Visual Benchmark for Contemporary Multimodal Models (zerobench.github.io)
Contemporary LMMs often exhibit remarkable performance on existing visual benchmarks, yet closer inspection reveals persistent shortcomings in their ability to interpret and reason about visual content. Many existing benchmarks tend to become saturated, losing their value as effective measures of the true visual understanding capabilities of frontier models.
Step-Video-T2V: The Practice, Challenges, and Future of Video Foundation Model (arxiv.org)
We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.
OpenArc – Lightweight Inference Server for OpenVINO (github.com/SearchSavior)
OpenArc is a lightweight inference API backend for Optimum-Intel from Transformers to leverage hardware acceleration on Intel CPUs, GPUs and NPUs through the OpenVINO runtime using OpenCL drivers.
The secret ingredients of word2vec (2016) (ruder.io)
This post will discuss the factors that account for the success of word2vec and its connection to more traditional models.
Physics Informed Neural Networks (pages.dev)
I mentioned in a previous post that as part of my position as junior DS for IKEA, I also get the opportunity to take part in their training program (called AI accelerator program). It’s a six months long program in which we are trained on all aspects of data science and AI by both professionals but also through various courses.
Overfitting to Theories of Overfitting (argmin.net)
I ended yesterday’s post arguing that we should remove this from machine learning classes.
Softmax forever, or why I like softmax (kyunghyuncho.me)
Deepseek R1 Distill 8B Q40 on 4 x Raspberry Pi 5 (github.com/b4rtaz)
Deepseek R1 Distill 8B Q40 on 4 x Raspberry Pi 5 8GB
Nature loves patterns (fayziev.com)
As I drift through my winter vacation, detoxing from work, tech and world matters, I notice my little brother crawling toward a heat stove. With a week of rest behind me, my mind wanders into philosophical territory, and I can't help but map this moment to machine learning.
OpenVINO AI effects for Audacity (audacityteam.org)
Intel has built a suite of AI tools for Audacity, useful for spoken word audio and music alike. These AI features run 100% locally on your PC.
LM2: Large Memory Models (arxiv.org)
This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts.
Automated Capability Discovery via Foundation Model Self-Exploration (arxiv.org)
Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data.
Advancements in embedding-based retrieval at Pinterest Homefeed (medium.com)
At Pinterest Homefeed, embedding-based retrieval (a.k.a Learned Retrieval) is a key candidate generator to retrieve highly personalized, engaging, and diverse content to fulfill various user intents and enable multiple actionability, such as Pin saving and shopping.
Noether's Theorem and Machine Learning (lee-phillips.org)
Emmy Noether discovered her eponymous theorem in the context of Einstein’s General Theory of Relativity. A couple of decades after her death, it became the foundation for modern particle physics. In the final chapter of my book about Noether’s Theorem I survey some of its recent applications far afield from either of these physics contexts—including applications outside of the physical sciences.
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs (emergent-values.ai)
As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values.
LLMs can teach themselves to better predict the future (arxiv.org)
We present an outcome-driven fine-tuning framework that enhances the forecasting capabilities of large language models (LLMs) without relying on human-curated reasoning samples.
The Curious Similarity Between LLMs and Quantum Mechanics (robleclerc.substack.com)
Six months ago, I did a deep dive to understand the transformer architecture and noticed something strange: the concepts behind these models strangely mirror many features and phenomena from quantum mechanics.
DeepSeek-R1 Exhibits Deceptive Alignment: AI That Knows It's Unsafe (ycombinator.com)
I've been testing DeepSeek-R1 and have uncovered a significant AI safety failure: the model demonstrates deceptive alignment.
Open R1: Update #2 (huggingface.co)
We are now two weeks into the Open R1 project which aims to reconstruct the missing pieces of DeepSeek R1—specifically, the training pipeline and synthetic data.
Show HN: Sort lines semantically using llm-sort (github.com/vagos)
LLM plugin for semantically sorting lines. Ranking techniques are based on this paper.
Scaling up test-time compute with latent reasoning: A recurrent depth approach (arxiv.org)
We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space.
Show HN: KTransformers:671B DeepSeek-R1 on a Single Machine-286 tokens/s Prefill (github.com/kvcache-ai)
KTransformers, pronounced as Quick Transformers, is designed to enhance your 🤗 Transformers experience with advanced kernel optimizations and placement/parallelism strategies.