Hacker News with Generative AI: Attention Mechanisms

Infinite Retrieval: Attention enhanced LLMs in long-context processing (arxiv.org)
Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task.
DeepSeek Native Sparse Attention (arxiv.org)
Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges.
Laser: Attention with Exponential Transformation (arxiv.org)
Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention.
Differential Transformer (arxiv.org)
Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise.
New attention mechanisms that outperform standard multi-head attention (arxiv.org)
Ring Attention Explained – Unlocking Near Infinite Context Window (coconut-mode.com)