Google's Gemini 3.1 Pro has established itself as the overall benchmark leader, topping 13 of 16 major benchmarks.
Key scores
- SWE-bench (coding): 80.6%
- GPQA Diamond (expert science): 94.3% — highest of any model
- ARC-AGI-2: 77.1%
- LM Council reasoning: 94.1%
Multimodal and context
Gemini 3.1 Pro features a 2-million token context window working natively across text, image, audio, and video in a single model. Available to Google AI Pro subscribers in the US and Gemini Ultra subscribers globally.