Hacker News with Generative AI: Speed

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference (cerebras.ai)
Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis.
6G tests reach a blisteringly quick 938 GB/s, 5000X faster than 5G (tomshardware.com)
Why Is Light So Fast? (profmattstrassler.com)
I’m often asked two very natural and related questions.
New standards for a faster and more private Internet (cloudflare.com)
As the Internet grows, so do the demands for speed and security.
Cerebras reaches 1800 tokens/s for 8B Llama3.1 (forbes.com)
Show HN: Beating OpenAI's structured outputs on cost, accuracy and speed (boundaryml.com)
Wuffs has the fastest, safest PNG decoder in the world (2021) (nigeltao.github.io)
Limits to running speed in dogs, horses and human (biologists.com)
Latest update for 'fast' compression algorithm LZ4 sprints past old versions (theregister.com)
Show HN: Music Generation - 100x Speed Demo (riffusion.com)
Show HN: Voice bots with 500ms response times (cerebrium.ai)
Ancient Star Seen Zooming Through Space at 600 Kilometers per Second (sciencealert.com)
Sparse Llama: 70% Smaller, 3x Faster, Full Accuracy (cerebras.net)
6G speeds hit 100 Gbps in new test – 500 times faster than average 5G cellphones (livescience.com)