Google DeepMind Releases Gemma 4: A New Open-Source Benchmark at 30B Parameters
In the early hours of today, Google DeepMind unveiled Gemma 4 — the latest generation of its open-source model family. With roughly 30 billion parameters, it challenges the leading open-source models while sharing the same underlying technology as Google's closed-source flagship, Gemini. Weights are fully public: anyone can download, modify, and deploy them.
The previous generation, Gemma 3, launched in March 2025. In the year since, several Chinese open-source labs iterated multiple times, quietly eroding Google's presence in the open-source race. Gemma 4 is Google's answer.
Four Models, Full Spectrum
Google released four variants in one drop — 2B, 4B, 26B, and 31B — covering everything from smartphones to workstations. Critically, the license has been upgraded from Google's proprietary agreement to Apache 2.0.

Large Model Group
- 31B Dense — 31 billion fully-activated parameters, 60 layers, 256K context. Quality ceiling model; ranked #3 on the Arena AI open-source leaderboard. Unquantized bfloat16 weights fit on a single 80GB H100; quantized versions run on consumer GPUs.
- 26B A4B MoE — 25.2B total / 3.8B active parameters, Mixture-of-Experts architecture (128 experts, 8 + 1 shared activated per token), 30 layers, 256K context. Inference speed comparable to a 4B model, quality far exceeding 4B. Ranked #6 on the leaderboard.

Small Model Group
- E4B — 8B total / 4.5B effective parameters, 42 layers, 128K context. The "E" stands for Effective — Per-Layer Embeddings technology means active parameters are far fewer than total parameters.
-
E2B — 5.1B total / 2.3B effective parameters, 35 layers, 128K context. Memory footprint can drop below 1.5GB on select devices.
All four models support image and video input across 140+ languages. The small models (E2B/E4B) add an ~300M-parameter audio encoder for speech recognition and translation (up to 30 seconds). The large models do not include audio — a deliberate product decision: voice is a must-have on mobile, not on workstations.
Google, the Pixel team, Qualcomm, and MediaTek co-optimized on-device deployment. E2B and E4B run fully offline on phones, Raspberry Pi, and NVIDIA Jetson Orin Nano.
Benchmark Results
Compared to Gemma 3 27B, the improvements across core metrics are generational:
- Math (AIME 2026): 31B scores 89.2% vs. 20.8% for Gemma 3 27B
- Code: Codeforces ELO from 110 → 2150; LiveCodeBench v6 from 29.1% → 80.0% — the biggest leap of any category
- Reasoning (GPQA Diamond): 42.4% → 84.3%; MMLU Pro: 67.6% → 85.2%
- Vision (MMMU Pro): 49.7% → 76.9%; Document OCR (OmniDocBench): 0.365 → 0.131
- Long context (MRCR v2 128K): 13.5% → 66.4% — previously a Gemma weakness, now largely closed
- Multilingual (MMMLU): 70.7% → 88.4%
The 26B MoE trails the 31B by only 2–5 percentage points on most benchmarks, but runs significantly faster — making it the better choice for latency-sensitive deployments. The E4B achieves MMLU Pro of 69.4% with just 4.5B effective parameters, approaching the previous-generation 27B's level.
Core Capabilities
Reasoning & Thinking Mode
All four models include a toggleable thinking mode. When enabled, the model outputs internal reasoning before delivering its final answer — the same capability lineage as Gemini's thinking feature. Particularly effective for math, logic, and multi-step planning tasks.
Agent Workflows
Native function calling and structured JSON output are supported out of the box. Google simultaneously released the Agent Development Kit (ADK), an open-source agent framework. On-device E2B/E4B models can run agent workflows; demo applications are already available in Google AI Edge Gallery.
Code Generation
Codeforces ELO 2150 and LiveCodeBench 80.0% make Gemma 4 genuinely usable for code completion and generation — offline.
Multimodal Understanding
All models process images and video (frame-by-frame, up to 60 seconds). Images support variable resolution and aspect ratios; visual token budget is manually configurable across five tiers (70–1120 tokens) — trade speed for precision or vice versa. Key use cases: OCR, document parsing, chart understanding.
Long Document Handling
Large models support 256K context; small models 128K. The architecture uses hybrid attention (alternating local sliding-window + global attention), with unified KV and Proportional RoPE on global layers to optimize long-context memory usage.

Multilingual
Natively trained on 140+ languages. MMMLU score: 88.4%.
Apache 2.0: A Statement, Not Just a License
Gemma 1, 2, and 3 all used Google's proprietary license — commercial use was permitted but with attached conditions. Gemma 4 switches to Apache 2.0, one of the most commercially permissive open-source licenses recognized by the developer community. Free to modify, distribute, and commercialize, with no user-count thresholds.
Hugging Face co-founder Clément Delangue called it a major milestone. Across three generations of custom licensing to Apache 2.0, the direction is unambiguous. Google has answered a two-year debate about how serious big tech is about open source.
The Open-Source Competitive Landscape
On the Arena AI open-source leaderboard, Gemma 4 31B ranks #3 and 26B MoE ranks #6. The models ahead are primarily from Chinese labs.
Current major open-source competitors include DeepSeek (V3.2 in production, V4 incoming), Qwen 3.5, GLM-5.1, MiniMax M2.5, and Kimi K2.5 — all of which released new versions in rapid succession around Chinese New Year, ranging from tens to hundreds of billions of parameters, each with distinct strengths in reasoning, code, and agent tasks.
Gemma 4's ceiling at 31B is a real constraint. But its differentiation lies elsewhere: chip-level partnerships with Qualcomm and MediaTek, native Android ecosystem integration, Apache 2.0 compliance, and the deepest engineering investment in on-device deployment of any model in this class. Training data cuts off January 2025; the composition of training data has not been disclosed.
Where to Try It
- Online: Google AI Studio (31B, 26B) · Google AI Edge Gallery App (E4B, E2B)
- Download: Hugging Face · Kaggle · Ollama
- Cloud deployment: Vertex AI · Cloud Run · GKE
- Android development: AICore Developer Preview (forward-compatible with Gemini Nano 4)
- Inference frameworks: Hugging Face Transformers · vLLM · llama.cpp · MLX · Ollama · NVIDIA NIM · LM Studio · Unsloth · SGLang
The Gemma series has surpassed 400 million cumulative downloads and spawned over 100,000 community variants. A Gemma 4 Good Challenge is now live on Kaggle, inviting developers to build socially impactful projects with the new models.