Hacker News with Generative AI: Attention Mechanisms

Infinite Retrieval: Attention enhanced LLMs in long-context processing (arxiv.org)
Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task.

Generative AI, Information Retrieval, Attention Mechanisms

37 points by TaurenHunter 226 days ago | 7 comments

DeepSeek Native Sparse Attention (arxiv.org)
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges.

Deep Learning, Language Models, Attention Mechanisms

16 points by bandwitch 237 days ago | 1 comments

Laser: Attention with Exponential Transformation (arxiv.org)
Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention.

Transformers, Machine Learning, Attention Mechanisms, Deep Learning

13 points by PaulHoule 319 days ago | 0 comments

Differential Transformer (arxiv.org)
Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise.

Machine Learning, Attention Mechanisms

562 points by weirdcat 370 days ago | 177 comments

New attention mechanisms that outperform standard multi-head attention (arxiv.org)

Deep Learning, Attention Mechanisms

233 points by snats 502 days ago | 49 comments

Ring Attention Explained – Unlocking Near Infinite Context Window (coconut-mode.com)

Machine Learning, Attention Mechanisms

10 points by simonguozirui 549 days ago | 2 comments