Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference
(cerebras.ai)
Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis.
Frontier AI now runs at instant speed. Last week we ran a customer workload on Llama 3.1 405B at 969 tokens/s – a new record for Meta’s frontier model. Llama 3.1 405B on Cerebras is by far the fastest frontier model in the world – 12x faster than GPT-4o and 18x faster than Claude 3.5 Sonnet. In addition, we achieved the highest performance at 128K context length and shortest time-to-first-token latency, as measured by Artificial Analysis.
Why Is Light So Fast?
(profmattstrassler.com)
I’m often asked two very natural and related questions.
I’m often asked two very natural and related questions.
New standards for a faster and more private Internet
(cloudflare.com)
As the Internet grows, so do the demands for speed and security.
As the Internet grows, so do the demands for speed and security.