Tied Crosscoders: Tracing How Chat LLM Behavior Emerges from Base Model(lesswrong.com) We are interested in model-diffing: finding what is new in the chat model when compared to the base model. One way of doing this is training a crosscoder, which would just mean training an SAE on the concatenation of the activations in a given layer of the base and chat model. When training this crosscoder, we find some latents whose decoder vector mostly helps reconstruct the base model activation and does not affect the reconstruction for the chat model activation.
Beyond Diffusion: Inductive Moment Matching(lumalabs.ai) There is a growing sentiment in the AI community that generative pre-training is reaching a limit. However, we argue that these limits are not due to a lack of data itself, but rather a stagnation in algorithmic innovation.
Scaling the Tülu 3 post-training recipes to surpass the perf of DeepSeek V3(allenai.org) Following the success of our Tülu 3 release in November, we are thrilled to announce the launch of Tülu 3 405B—The first application of fully open post-training recipes to the largest open-weight models. With this release, we demonstrate the scalability and effectiveness of our post-training recipe applied at 405B parameter scale.
Explaining Large Language Models Decisions Using Shapley Values(arxiv.org) The emergence of large language models (LLMs) has opened up exciting possibilities for simulating human behavior and cognitive processes, with potential applications in various domains, including marketing research and consumer behavior analysis.
89 points by veryluckyxyz 95 days ago | 19 comments
Ilya Sutskever NeurIPS talk [video](youtube.com) OpenAI’s cofounder and former chief scientist, Ilya Sutskever, made headlines earlier this year after he left to start his own AI lab called Safe Superintelligence Inc.
309 points by mfiguiere 109 days ago | 240 comments
Ethical Challenges Related to the NeurIPS 2024 Best Paper Award(var-integrity-report.github.io) To AI Research Community: This report is written to convey our serious concerns about the recent recipient of the Best Paper award at NeurIPS 2024, Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (VAR) . While we acknowledge that this NeurIPS paper is technically sound, we must emphasize that it involves serious misconduct by the first author (Keyu Tian), which fundamentally undermines the core values of integrity and trust upon which our academic community is built.
Ethical Challenges Related to the NeurIPS 2024 Best Paper Award(var-integrity-report.github.io) To AI Research Community: This report is written to convey our serious concerns about the recent recipient of the Best Paper award at NeurIPS 2024, Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction (VAR). While we acknowledge that this NeurIPS paper is technically sound, we must emphasize that it involves serious misconduct by the first author (Keyu Tian), which fundamentally undermines the core values of integrity and trust upon which our academic community is built.
45 points by tarolangner 141 days ago | 6 comments
GPTs Are Maxed Out(thealgorithmicbridge.com) March 2024. OpenAI CEO Sam Altman joins podcaster Lex Fridman for the second time since ChatGPT came out a year prior. The stakes are high and anticipation is tangible. GPT-5 appears to be around the corner. Altman, elusive as always, provides only one data point for us hungry spectators: The next-gen model (he doesn’t name it) will be better than GPT-4 to the same degree that GPT-4 was better than GPT-3.
Hunyuan-Large: An Open-Source Moe Model with 52B Activated Parameters(arxiv.org) In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens.