breakingMarch 9, 2026

Google LiteRT-LM PR adds Gemma4 NPU support ahead of an expected release

A Google bot-authored LiteRT-LM pull request references Gemma4 and AIcore NPU support, while multiple posts claim a largest version around 120B total and 15B active parameters. Engineers targeting on-device inference should wait for a formal model card before locking plans.

Gemini Inference Optimization

3 min read

Google LiteRT-LM PR adds Gemma4 NPU support ahead of an expected release

TL;DR

A GitHub pull request in Google's google-ai-edge/LiteRT-LM repo explicitly says "Add NPU support for AIcore for Gemma4 model," and the screenshots shared by the first sighting and a second capture show it coming from copybara-service[bot], Google's internal sync bot.
Separate posts are circulating a parameter rumor: one widely shared claim says Gemma 4's largest variant could be "~120B total" with "15B active" parameters, and a follow-on post repeats the same mixture-of-experts-style sizing, but there is no official model card in the evidence.
The repo reference matters because it ties Gemma4 to LiteRT-LM and "AIcore" NPU support in the PR text from the GitHub screenshot, which points to on-device or edge inference work rather than just a model name leak.
Release timing still looks unofficial:

What the PR actually shows

AiBattle

@AiBattle_

·Follow

Gemma 4 has been spotted on GitHub. The PR appears to be from Google’s bot account

11:33 AM · Mar 9, 2026

288

Read 8 replies

The concrete signal is narrow but real. The screenshots shared in the original post and a second post show an open PR titled "Add NPU support for AIcore for Gemma4 model" in google-ai-edge/LiteRT-LM, with a comment from copybara-service[bot] repeating the same text. The image OCR in both posts identifies Copybara-Service as "an helper app for Google Copybara, synchronizing repositories maintained by Google," which makes this look like an internal-to-public repo sync rather than a random third-party fork.

Chubby♨️

@kimmonismus

·Follow

Google Gemma 4 incoming! Let’s go!

AiBattle

@AiBattle_

Gemma 4 has been spotted on GitHub. The PR appears to be from Google’s bot account

12:42 PM · Mar 9, 2026

193

Read 14 replies

For engineers, the interesting part is not just the string "Gemma4." It is the coupling of Gemma4 with LiteRT-LM, NPU support, and AIcore in the PR title itself. That suggests Google is plumbing runtime support for a new model family into its lightweight inference stack before, or alongside, a public release.

What the size rumors say

Legit

@legit_api

·Follow

Yes, Gemma 4 will soon be released this time their largest size might be around 120B total with 15B active

AiBattle

@AiBattle_

Gemma 4 has been spotted on GitHub. The PR appears to be from Google’s bot account

12:00 PM · Mar 9, 2026

790

Read 34 replies

The parameter details are still rumor, not announcement. In one supporting post, the claim is that Gemma 4's biggest model will be "around 120B total" with "15B active" parameters; another post repeats "120b in total, 15b active parameters." If accurate, that would imply an MoE-style architecture where only a subset of parameters is active per token.

What is missing matters just as much. None of the evidence includes an official Google post, model card, context window, tokenizer details, benchmark table, quantization guidance, license update, or API availability. So the sizing rumor is useful as an early planning signal, but it does not yet answer deployability questions.

Why this still isn't a release

AI Leaks and News

@AILeaksAndNews

·Follow

Google DeepMind’s Logan Kilpatrick says its “going to be a fun week of launches” We have already spotted Gemma 4 (see below), so we could potentially also see Gemini 3.1 Pro GA, Gemini Flash 3.1 GA Gemini Pro 3.2 preview and more What AI models are you hoping to see this week?

Logan Kilpatrick

@OfficialLoganK

Going to be a fun week of launches : )

6:04 PM · Mar 9, 2026

227

Read 15 replies

The strongest timing hint comes from the launch-week comment, which says Logan Kilpatrick called it "going to be a fun week of launches." Read together with the LiteRT-LM PR, that makes an imminent Gemma 4 reveal plausible.

But the evidence still describes a pre-release state. There are no published weights, no serving endpoints, and no reproducible evals attached to the leak. Right now the actionable facts are limited to a Google-linked LiteRT-LM PR mentioning "Gemma4" and "AIcore" NPU support, plus an unverified large-model sizing claim circulating in social posts.