Hacker News with Generative AI: Machine Learning

Show HN: One Prompt – A super framework for prompt engineering (github.com/VaibhavAcharya)
A library that brings structure, validation, and reliability to prompt engineering.
Grant Sanderson: Visualizing transformers and attention [video] (youtube.com)
Decisions and Dragons (decisionsanddragons.com)
A guide to the perilous world of reinforcement learning.
AI can learn to think before it speaks (ft.com)
AI can learn to think before it speaks
Large-Scale Dimension Reduction with Both Global and Local Structure (2021) [pdf] (jmlr.org)
Niantic announces “Large Geospatial Model” trained on Pokémon Go player data (nianticlabs.com)
At Niantic, we are pioneering the concept of a Large Geospatial Model that will use large-scale machine learning to understand a scene and connect it to millions of other scenes globally.
PyTorch 101: Understanding Graphs, Automatic Differentiation and Autograd (digitalocean.com)
PyTorch is one of the foremost python deep learning libraries out there. It’s the go to choice for deep learning research, and as each days passes by, more and more companies and research labs are adopting this library.
Batched reward model inference and Best-of-N sampling (raw.sh)
Reward models have been a key part of reinforcement learning on top of LLMs, used broadly in techniques like RLHF and as LLM-as-a-judge critics in evals.
Show HN: Nosia – Privacy-focused AI to run models on your own data and device (github.com/nosia-ai)
Nosia is a platform that allows you to run an AI model on your own data. It is designed to be easy to install and use.
Fireworks F1: A Breakthrough in Complex Reasoning with Compound AI (fireworks.ai)
At Fireworks, we believe the future of AI is shifting to compound AI systems that combine specialized models and tools to achieve better performance, reliability and control, compared to a single model.
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation (nirvanalan.github.io)
GaussianAnything generates high-quality and editable surfel Gaussians through a cascaded 3D diffusion pipeline, given single-view images or texts as the conditions.
Awesome-Geo (github.com/DavidHuji)
Awesome list for research on GEO (Generative Engine Optimization).
You could have designed state of the art positional encoding (fleetwood.dev)
This post walks you through the step-by-step discovery of state-of-the-art positional encoding in transformer models. We will achieve this by iteratively improving our approach to encoding position, arriving at Rotary Postional Encoding (RoPE) used in the latest LLama 3.2 release and most modern transformers. This post intends to limit the mathematical knowledge required to follow along, but some basic linear algebra, trigonometry and understanding of self attention is expected.
Show HN: Model2vec – Lightning-fast Static Embeddings for RAG/Semantic Search (github.com/MinishLab)
Model2Vec is a technique to turn any sentence transformer into a really small static model, reducing model size by 15x and making the models up to 500x faster, with a small drop in performance. Our best model is the most performant static embedding model in the world. See our results here, or dive in to see how it works.
Show HN: AnyModal – Train Your Own Multimodal LLMs (github.com/ritabratamaiti)
AnyModal is a modular and extensible framework for integrating diverse input modalities (e.g., images, audio) into large language models (LLMs). It enables seamless tokenization, encoding, and language generation using pre-trained models for various modalities.
Towards Nyquist Learners (gwern.net)
Overview of differential geometry for Hamiltonian Monte Carlo (arxiv.org)
Hamiltonian Monte Carlo has proven a remarkable empirical success, but only recently have we begun to develop a rigorous understanding of why it performs so well on difficult problems and how it is best applied in practice.
Convolutional Differentiable Logic Gate Networks (arxiv.org)
With the increasing inference cost of machine learning models, there is a growing interest in models with fast and efficient inference.
Show HN: ColiVara – State of the Art RAG API with Vision Models (github.com/tjmlabs)
ColiVara = COntextualized Late Interaction Vision Augmented Retrieval API
Numpyro: Probabilistic programming with NumPy powered by Jax (github.com/pyro-ppl)
NumPyro is a lightweight probabilistic programming library that provides a NumPy backend for Pyro. We rely on JAX for automatic differentiation and JIT compilation to GPU / CPU. NumPyro is under active development, so beware of brittleness, bugs, and changes to the API as the design evolves.
Using VS Code to track and visualize ML experiments (visualstudio.com)
Run, compare, visualize, and track machine learning experiments right in VS Code. This extension uses DVC, an open-source data versioning and ML experiment management tool. No additional services or databases are required.
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization (rccchoudhury.github.io)
We present Run-Length Tokenization (RLT), a simple and efficient approach to speed up video transformers by removing redundant tokens from the input.
1-Bit AI Infrastructure (arxiv.org)
Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption.
How AI is beating VCs in their own game (arxiv.org)
Traditional decision tree algorithms are explainable but struggle with non-linear, high-dimensional data, limiting its applicability in complex decision-making.
The barriers to AI engineering are crumbling fast (helix.ml)
A couple of weeks ago, I gave a talk at Hannah Foxwell’s amazing AI for the Rest of Us conference about something that's been brewing in my mind after years of working in DevOps, MLOps, and now GenAI: the barriers to AI engineering are crumbling fast. The tools have gotten good enough that if you can handle an IDE and push some YAML to git, you're already qualified.
The geometry of data: the missing metric tensor and the Stein score [Part II] (christianperone.com)
I’m writing this second part of the series because I couldn’t find any formalisation of this metric tensor that naturally arises from the Stein score (especially when used with learned models), and much less blog posts or articles about it, which is surprising given its deep connection between score-based generative models, diffusion models and the geometry of the data manifold.
Show HN: I made a scam detector and build a public database for it (antiphish.ai)
Since our founding, we've been dedicated to securing your online experience. Join the community of long-time users who trust us with their protection.
Play Dialog: A contextual turn-taking TTS model like NotebookLM Playground (play.ai)
PlayNoteAgentsPlaygroundPricingAPICommunityConversation (2 Speakers)Narration (1 Speaker)LanguageSpeaker 1 VoiceSpeaker 2 VoiceConnecting...Random PromptCreate Voice Clone
You're Not Testing Your AI Well Enough (tryreva.com)
Large Language Models (LLMs) have revolutionised machine learning, offering unprecedented versatility across various tasks. However, this flexibility poses a significant challenge: how do we effectively evaluate LLMs to ensure they’re suitable for specific applications?
Benchmarking Vision, Language, and Action Models on Robotic Learning Tasks (multinet.ai)
Vision-language-action (VLA) models represent a promising direction for developing general-purpose robotic systems, demonstrating the ability to combine visual understanding, language comprehension, and action generation.