Gemini 3.1 Pro Tops 13 of 16 Major AI Benchmarks — Including 80.6% SWE-bench and 94.3% GPQA Diamond

Models Apr 29, 2026·Artificial Analysis

Google's Gemini 3.1 Pro has established itself as the overall benchmark leader, topping 13 of 16 major benchmarks.

Key scores

SWE-bench (coding): 80.6%
GPQA Diamond (expert science): 94.3% — highest of any model
ARC-AGI-2: 77.1%
LM Council reasoning: 94.1%

Multimodal and context

Gemini 3.1 Pro features a 2-million token context window working natively across text, image, audio, and video in a single model. Available to Google AI Pro subscribers in the US and Gemini Ultra subscribers globally.