Hacker News with Generative AI: Deep Learning

InstantStyle: Free Lunch Towards Style-Preserving in Text-to-Image Generation (github.com/instantX-research)
InstantStyle is a general framework that employs two straightforward yet potent techniques for achieving an effective disentanglement of style and content from reference images.
Spark-TTS: Text-2-Speech Model Single-Stream Decoupled Tokens [pdf] (arxiv.org)
Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.
ChatGPT is made from 100M of these [The Perceptron] [video] (youtube.com)
DiffRhythm: Fast End-to-End Full-Length Song Generation with Latent Diffusion (aslp-lab.github.io)
Recent advancements in music generation have garnered significant attention, yet existing approaches face critical limitations.
Cautious Optimizers: Improving Training with One Line of Code (arxiv.org)
AdamW has been the default optimizer for transformer pretraining. For many years, our community searched for faster and more stable optimizers with only constrained positive outcomes. In this work, we propose a single-line modification in Pytorch to any momentum-based optimizer, which we rename cautious optimizer, e.g. C-AdamW and C-Lion.
DeepSeek-V3/R1 Inference System Overview (github.com/deepseek-ai)
The optimization objectives of serving DeepSeek-V3/R1 inference are: higher throughput and lower latency.
DeepSeek Open Source Optimized Parallelism Strategies, 3 repos (github.com/deepseek-ai)
Here, we publicly share profiling data from our training and inference framework to help the community better understand the communication-computation overlap strategies and low-level implementation details.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling (github.com/deepseek-ai)
DeepGEMM is a library designed for clean and efficient FP8 General Matrix Multiplications (GEMMs) with fine-grained scaling, as proposed in DeepSeek-V3. It supports both normal and Mix-of-Experts (MoE) grouped GEMMs. Written in CUDA, the library has no compilation need during installation, by compiling all kernels at runtime using a lightweight Just-In-Time (JIT) module.
Awesome DeepSeek Integrations (github.com/deepseek-ai)
Integrate the DeepSeek API into popular softwares. Access DeepSeek Open Platform to get an API key.
DeepSeek open source DeepEP – library for MoE training and Inference (github.com/deepseek-ai)
DeepEP is a communication library tailored for Mixture-of-Experts (MoE) and expert parallelism (EP). It provides high-throughput and low-latency all-to-all GPU kernels, which are also as known as MoE dispatch and combine. The library also supports low-precision operations, including FP8.
<think> </think> QwQ-Max-Preview (qwenlm.github.io)
We’re happy to unveil QwQ-Max-Preview , the latest advancement in the Qwen series, designed to push the boundaries of deep reasoning and versatile problem-solving.
DeepDive in everything of Llama3: revealing detailed insights and implementation (github.com/therealoliver)
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
DeepSeek Open Infra: Open-Sourcing 5 AI Repos in 5 Days (github.com/deepseek-ai)
We're a tiny team @deepseek-ai pushing our limits in AGI exploration.
Show HN: A new fork of OpenDeepResearcher with DeepSeek R1 (youtube.com)
DeepSeek Native Sparse Attention (arxiv.org)
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges.
Google Titans Model Explained: The Future of Memory-Driven AI Architectures (medium.com)
Imagine trying to solve a puzzle with pieces scattered across miles. That’s the challenge modern AI models face when processing long sequences of data.
The secret ingredients of word2vec (2016) (ruder.io)
This post will discuss the factors that account for the success of word2vec and its connection to more traditional models.
Animate Anyone 2: High-Fidelity Character Image Animation (humanaigc.github.io)
Recent character image animation methods based on diffusion models, such as Animate Anyone, have made significant progress in generating consistent and generalizable character animations. However, these approaches fail to produce reasonable associations between characters and their environments.
Softmax forever, or why I like softmax (kyunghyuncho.me)
Ask HN: Is anybody building an alternative transformer? (ycombinator.com)
Curious if anybody out there is trying to build a new model/architecture that would succeed the transformer?
The Curse of Depth in Large Language Models (huggingface.co)
In this paper, we introduce the Curse of Depth, a concept that highlights, explains, and addresses the recent observation in modern Large Language Models(LLMs) where nearly half of the layers are less effective than expected.
Ask HN: Has anyone used DeepSeek in production? (ycombinator.com)
There's been a lot of hype about the model for it's low inference costs and chain of thought reasoning but has anyone had actual success using it for production use-cases?
Goku Flow Based Video Generative Foundation Models (github.com/Saiyan-World)
Goku is a new family of joint image-and-video generation models based on rectified flow Transformers.
Open R1: Update #2 (huggingface.co)
We are now two weeks into the Open R1 project which aims to reconstruct the missing pieces of DeepSeek R1—specifically, the training pipeline and synthetic data.
Scaling up test-time compute with latent reasoning: A recurrent depth approach (arxiv.org)
We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space.
DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL (notion.site)
Show HN: KTransformers:671B DeepSeek-R1 on a Single Machine-286 tokens/s Prefill (github.com/kvcache-ai)
KTransformers, pronounced as Quick Transformers, is designed to enhance your 🤗 Transformers experience with advanced kernel optimizations and placement/parallelism strategies.
Deepseek VL2 Small (huggingface.co)
Running on Zero
Run Deepseek from fast NVMe drives (github.com/BlinkDL)
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
Value-Based Deep RL Scales Predictably (arxiv.org)
Scaling data and compute is critical to the success of machine learning. However, scaling demands predictability: we want methods to not only perform well with more compute or data, but also have their performance be predictable from small-scale runs, without running the large-scale experiment.