HuggingFace/open-r1: open reproduction of DeepSeek-R1
(github.com/huggingface)
A fully open reproduction of DeepSeek-R1. This repo is a work in progress, let's build it together!
A fully open reproduction of DeepSeek-R1. This repo is a work in progress, let's build it together!
DeepSeek Outpaced OpenAI at 3% of the Cost
(venturebeat.com)
DeepSeek R1’s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance.
DeepSeek R1’s Monday release has sent shockwaves through the AI community, disrupting assumptions about what’s required to achieve cutting-edge AI performance.
DeepSeek R1 Runs at 200 Tokens per Second on Raspberry Pi
(nextbigfuture.com)
Experimenters have had overnight tests confirming they have OPEN SOURCE DeepSeek R1 running at 200 tokens per second on a NON-INTERNET connected Raspberry Pi.
Experimenters have had overnight tests confirming they have OPEN SOURCE DeepSeek R1 running at 200 tokens per second on a NON-INTERNET connected Raspberry Pi.
Titans: Learning to Memorize at Test Time
(arxiv.org)
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention.
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention.
Ε, a Nuisance No More
(zna.do)
For a while now I have been advocating for tuning ε in various parts of the modern deep learning stack, and in this post I’ll explain why.
For a while now I have been advocating for tuning ε in various parts of the modern deep learning stack, and in this post I’ll explain why.
Titans: Learning to Memorize at Test Time
(arxiv.org)
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention.
Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention.
Intrinsic Dimensions: How Learning in Large Models Is Driven by a Few Parameters
(medium.com)
Learned over-parameterized models inherently exist within a low intrinsic dimension {Li et al.¹ and Aghajanyan et al.³}. To understand this concept better, let’s delve into the following questions:
Learned over-parameterized models inherently exist within a low intrinsic dimension {Li et al.¹ and Aghajanyan et al.³}. To understand this concept better, let’s delve into the following questions:
Show HN: DeepFace – A lightweight deep face recognition library for Python
(github.com/serengil)
DeepFace is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python.
DeepFace is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python.
1.58-Bit Flux
(chenglin-yang.github.io)
We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images.
We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images.
Tenstorrent Wormhole Series
(corsix.org)
A company called Tenstorrent design and sell PCIe cards for AI acceleration. At the time of writing, they've recently started shipping their Wormhole n150s and Wormhole n300s cards.
A company called Tenstorrent design and sell PCIe cards for AI acceleration. At the time of writing, they've recently started shipping their Wormhole n150s and Wormhole n300s cards.
Beyond Gradient Averaging in Parallel Optimization
(arxiv.org)
We introduce Gradient Agreement Filtering (GAF) to improve on gradient averaging in distributed deep learning optimization.
We introduce Gradient Agreement Filtering (GAF) to improve on gradient averaging in distributed deep learning optimization.
Hallucination of closed repeat proteins containing central pockets (2023)
(nature.com)
Inspired by these proteins, we devised a deep-learning-based approach to broadly exploring the space of closed repeat proteins starting from only a specification of the repeat number and length.
Inspired by these proteins, we devised a deep-learning-based approach to broadly exploring the space of closed repeat proteins starting from only a specification of the repeat number and length.
The Structure of Neural Embeddings
(seanpedersen.github.io)
A small collection of insights on the structure of embeddings (latent spaces) produced by deep neural networks.
A small collection of insights on the structure of embeddings (latent spaces) produced by deep neural networks.
Exploring LoRA – Part 1: The Idea Behind Parameter Efficient Fine-Tuning
(medium.com)
Pre-trained large language models undergo extensive training on vast data from the internet, resulting in exceptional performance across a broad spectrum of tasks. Nonetheless, in most real-world scenarios, there arises a necessity for the model to possess expertise in a particular, specialized domain.
Pre-trained large language models undergo extensive training on vast data from the internet, resulting in exceptional performance across a broad spectrum of tasks. Nonetheless, in most real-world scenarios, there arises a necessity for the model to possess expertise in a particular, specialized domain.
No More Adam: Learning Rate Scaling at Initialization Is All You Need
(arxiv.org)
In this work, we question the necessity of adaptive gradient methods for training deep neural networks.
In this work, we question the necessity of adaptive gradient methods for training deep neural networks.
FastVideo: a lightweight framework for accelerating large video diffusion models
(github.com/hao-ai-lab)
FastVideo is a lightweight framework for accelerating large video diffusion models.
FastVideo is a lightweight framework for accelerating large video diffusion models.
Veo 2: Our video generation model
(deepmind.google)
Veo creates videos with realistic motion and high quality output, up to 4K. Explore different styles and find your own with extensive camera controls.
Veo creates videos with realistic motion and high quality output, up to 4K. Explore different styles and find your own with extensive camera controls.
Founder who built Snap's AI launches a snappy new take on video chatbots
(techcrunch.com)
A deep learning scientist whose last startup was acquired by Snap to build its My AI chatbot has raised seed funding for his latest venture: a platform for building and operating real-time, video-based conversational AI agents.
A deep learning scientist whose last startup was acquired by Snap to build its My AI chatbot has raised seed funding for his latest venture: a platform for building and operating real-time, video-based conversational AI agents.
From Unemployment to Lisp: Running GPT-2 on a Teen's Deep Learning Compiler
(github.com/hikettei)
This repository is still in the early stages of development. Additionally, it includes many experimental approaches. Please consider this as a place to experiment with my ideas. Do not use it in a product under any circumstances.
This repository is still in the early stages of development. Additionally, it includes many experimental approaches. Please consider this as a place to experiment with my ideas. Do not use it in a product under any circumstances.