Google DeepMind Releases Gemma 4 Under A Fully Permissive Apache 2.0 License

Google DeepMind released Gemma 4 on April 2, 2026, a family of four open-weight AI models spanning edge devices to 31 billion parameters, now available for the first time under the fully permissive Apache 2.0 license.

Fawad Malik4 seconds agoLast Updated: April 3, 2026

3 minutes read

Google has released Gemma 4 with a fully open license, allowing developers to build and commercially sell AI-driven products without restrictions.

Key Takeaways

Gemma 4 ships in four sizes: E2B, E4B, 26B MoE, and 31B Dense; all under Apache license.
The 31B Dense model ranks third among all open models on the Arena AI text leaderboard.
Google and NVIDIA collaborated to optimize all Gemma 4 variants for RTX GPUs and DGX Spark.
Gemma models have been downloaded over 400 million times since the first launch in 2024.

Google DeepMind has launched Gemma 4, its most capable open-weight model family to date, built from the same research and technology as Gemini 3. This is designed to run across the full hardware spectrum, from Android smartphones to single professional GPUs.

According to the Google Blog, this is the first time the Gemma series is released under the Apache 2.0 license. This shift positions Gemma 4 as Google’s strongest move into the enterprise open-source AI market, competing directly with models like Llama 4 from Meta, which have so far dominated developer attention.

What Gemma 4 Brings to the Table Technically

The Google Blog confirms that Gemma 4 comes in four variants:

Effective 2B and Effective 4B, built with Qualcomm, MediaTek, and the Pixel team. These are designed for edge devices like Android phones, Raspberry Pi, and Jetson Nano, delivering near-zero latency.
A 26B Mixture of Experts model (MoE) with 3.8 billion active parameters per inference.
A 31B Dense model, which ranks third among open models on the Arena AI text leaderboard, while the 26B MoE ranks sixth per Google.

All four models support native image and video processing, with the two edge models additionally supporting native audio input for speech recognition.

Context windows are large: 128K tokens for the E2B and E4B models, and up to 256K tokens for the 26B and 31B models. This makes the larger models ideal for local AI-assisted coding workflows, capable of processing full repositories or long documents in a single prompt.

The Google Blog also confirms support for over 140 languages, function calling, structured JSON output, and native system instructions for agentic workflows, features that echo the capabilities offered by platforms like Mistral AI’s Forge.

Why Apache 2.0 Changes the Open AI Equation

As Ars Technica reported, the move to the Apache 2.0 license is the most important non-technical change in the Gemma 4 release.

Google’s earlier custom Gemma license allowed the company to revoke access and restricted some commercial uses. This made enterprises and startups cautious about building production products on top of the model family.

However, Apache 2.0 removes those limitations. Developers and companies can now use, modify, distribute, and commercially deploy Gemma 4 without special permission or risk of Google changing the terms.

This puts Gemma 4 directly against Chinese open‑weight models like Qwen from Alibaba and Moonshot’s Kimi K2.5, which Cursor recently confirmed using as the Composer 2 base.

Both of these open models have gained traction with developers because of their permissive licensing. With Apache 2.0, Gemma 4 now competes on equal legal footing.

NVIDIA Accelerates Gemma 4 for Local Agentic AI

According to the NVIDIA Official Blog, Google and NVIDIA worked together to optimize all four Gemma 4 variants for NVIDIA hardware. This includes consumer RTX PCs, the DGX Spark AI supercomputer, and Jetson Orin Nano edge modules.

NVIDIA’s Tensor Cores speed up AI inference for Gemma 4, delivering up to 2.7 times better performance on an RTX 5090 compared to an Apple M3 Ultra running llama.cpp.

The company also confirmed day one support across Ollama, llama.cpp, vLLM, and Unsloth Studio for fine-tuning and local deployment.

A key use case is running private AI agents through NVIDIA NemoClaw on the OpenClaw desktop platform. Gemma 4 integrates fully with the ecosystem, enabling private, hardware-accelerated autonomous agents to process local files and apps without cloud data transfers.

Who Can Try Gemma 4 and How

The Gemma 4 is available immediately and can also be downloaded to run locally with no subscription or cloud fees required.

The 31B and 26B models are accessible through Google AI Studio, while the E4B and E2B edge models are available in the Google AI Edge Gallery on both iOS and Android. Model weights can also be downloaded via Hugging Face, Kaggle, and Ollama.

Since the first Gemma launch in 2024, the community has created over 100,000 model variants, which Google calls the “Gemmaverse.” This highlights the high level of developer engagement with the model family, which Gemma 4 is designed to further expand.

Source: Gemma 4: Byte for byte

Fawad Malik4 seconds agoLast Updated: April 3, 2026

3 minutes read