Hacker News with Generative AI: Optimization Techniques

Looking Back at Speculative Decoding (research.google)
Speculative decoding has proven to be an effective technique for faster and cheaper inference from LLMs without compromising quality. It has also proven to be an effective paradigm for a range of optimization techniques.
Scalable self-improvement for compiler optimization (research.google)
We introduce Iterative BC-Max, a novel technique that aims to reduce the size of the compiled binary files by improving inlining decisions. We describe several benefits to using this approach compared to using off-the-shelf RL algorithms.