Meta Llama 4 Scout: 10M Token Context with MoE Architecture

Meta has released Llama 4 Scout and Maverick, the first models in the Llama family to use a Mixture-of-Experts (MoE) architecture — a design that activates only a subset of parameters for each query, dramatically improving efficiency.

Scout: built for long context

Llama 4 Scout has 17 billion active parameters across 16 experts (109 billion total parameters) and a 10 million token context window — the largest of any open model. This enables Scout to process entire codebases, legal documents or research archives in a single pass.

Maverick: built for performance

Llama 4 Maverick is the higher-capability variant, with more experts and stronger performance on reasoning and coding benchmarks. Meta positions it as competitive with closed frontier models on most standard evaluations.

Commercial use

Both models are available for commercial use under Meta's updated Llama license, which removes the previous restrictions on companies with large user bases. The weights can be downloaded from Hugging Face and run on standard GPU hardware.

Meta says the MoE architecture reduces inference costs by approximately 60% compared to a dense model of equivalent quality — a significant advantage for large-scale deployments.