Gemini 3.1 Pro: A Leap in AI Reasoning

Google DeepMind launches Gemini 3.1 Pro with a new three-tier thinking system, 1M token context window, and benchmark scores that leave the competition behind — including a 77.1% on ARC-AGI-2.

Google DeepMind has released Gemini 3.1 Pro, its most capable reasoning model to date. Announced on February 19, 2026, this is the first time Google has used a ".1" version increment — every previous mid-cycle update used ".5" — signaling that this is a focused intelligence upgrade rather than a broad feature expansion. And the benchmarks back that up.

The Numbers That Matter

Gemini 3.1 Pro scored 77.1% on ARC-AGI-2, a benchmark that tests whether a model can solve entirely new logic patterns it hasn't seen before. For context, the previous version scored 31.1% on the same test. That's not an incremental improvement — it's a fundamental shift in abstract reasoning ability.

The model also hit 2887 Elo on LiveCodeBench Pro, 94.3% on GPQA Diamond (a graduate-level science benchmark), and claimed the top spot on 13 out of 16 key benchmarks tracked by Google. These include tests for agentic tasks, code generation, mathematical reasoning, and multimodal understanding.

Three-Tier Thinking System

The standout architectural change is a new three-level thinking system. Previous Gemini models had a binary setup — either low or high computational effort. Gemini 3.1 Pro adds a "Medium" tier, giving developers fine-grained control over how much processing power the model dedicates to each request.

This matters more than it sounds. Low thinking mode keeps responses fast and cheap for simple tasks. Medium handles most everyday reasoning without burning through tokens. High mode unlocks the full reasoning chain for complex problems — think multi-step math proofs, intricate code debugging, or analyzing contradictions across long documents. Developers can set the thinking level per request, which means they can optimize both cost and quality without switching models.

1M Token Context, 64K Output

Gemini 3.1 Pro supports an input context window of just over 1 million tokens and can generate outputs up to 65,536 tokens. The context window isn't new — Gemini 2.5 Pro had the same capacity — but the model is significantly better at actually using it. Long-context retrieval accuracy has improved, meaning the model is less likely to "forget" information buried deep in a massive document.

The model handles text, audio, images, video, PDFs, and entire code repositories. This multimodal capability combined with the million-token context makes it particularly useful for tasks like analyzing full codebases, processing lengthy legal documents, or working through research papers with embedded figures and tables.

Where to Use It

Gemini 3.1 Pro is available through the Gemini API in Google AI Studio, Vertex AI, the Gemini app, Gemini CLI, Android Studio, and Nano Banana 2 alongside other Gemini 3.1 family members. The lightweight sibling Gemini 3.1 Flash Lite is also available for cost-sensitive applications. Enterprise users get the model through Vertex AI with all the usual compliance and data residency guarantees.

Pricing follows the standard Gemini tier structure, with costs varying based on the thinking level selected. The Medium tier hits a sweet spot for most production workloads — better output quality than Low mode at a fraction of the High mode cost.

Why the Numbers Matter

Gemini 3.1 Pro is Google's answer to the reasoning race that has defined AI development over the past year. The ARC-AGI-2 score alone puts it in a different category from most competing models — this is a benchmark specifically designed to be hard for AI systems, and nearly tripling the previous score is a milestone.

The three-tier thinking system is equally important from a practical standpoint. Instead of forcing developers to choose between a fast-but-dumb model and a slow-but-smart one, Google is offering a single model with a dial. That simplicity could matter a lot for teams that don't want to manage multiple model deployments — and it's the kind of developer experience improvement that doesn't show up in benchmarks.

Full details are available on the Google AI blog.