gemma-gem-browser-ai
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemma Gem Browser AI
Gemma Gem 浏览器AI
Skill by ara.so — Daily 2026 Skills collection.
Gemma Gem is a Chrome extension that runs Google's Gemma 4 model entirely on-device via WebGPU. It injects a chat overlay into every page and exposes a tool-calling agent loop that can read pages, click elements, fill forms, execute JavaScript, and take screenshots — all without sending data to any server.
Skill by ara.so — 2026年度每日技能合集。
Gemma Gem是一款Chrome扩展,通过WebGPU完全在端侧运行谷歌的Gemma 4模型。它会在每个页面注入聊天浮层,提供工具调用Agent循环,支持读取页面内容、点击元素、填写表单、执行JavaScript、截图等功能,所有操作都不会向任何服务器发送数据。
Architecture Overview
架构总览
Offscreen Document Service Worker Content Script
(Gemma 4 + Agent Loop) <-> (Message Router) <-> (Chat UI + DOM Tools)
| |
WebGPU inference Screenshot capture
Token streaming JS execution- Offscreen document (): Loads the ONNX model via
offscreen/, runs the agent loop, streams tokens.@huggingface/transformers - Service worker (): Routes messages, handles
background/andtake_screenshot.run_javascript - Content script (): Injects shadow DOM chat UI, executes DOM tools.
content/ - : Zero-dependency module defining
agent/andModelBackendinterfaces — extractable as a standalone library.ToolExecutor
Offscreen Document Service Worker Content Script
(Gemma 4 + Agent Loop) <-> (Message Router) <-> (Chat UI + DOM Tools)
| |
WebGPU inference Screenshot capture
Token streaming JS execution- 离屏文档 (): 通过
offscreen/加载ONNX模型,运行Agent循环,流式输出token。@huggingface/transformers - Service worker (): 路由消息,处理
background/和take_screenshot请求。run_javascript - Content script (): 注入shadow DOM聊天UI,执行DOM工具。
content/ - : 零依赖模块,定义了
agent/和ModelBackend接口,可提取为独立库使用。ToolExecutor
Install & Build
安装与构建
bash
undefinedbash
undefinedPrerequisites: Node.js 18+, pnpm
前置依赖: Node.js 18+, pnpm
pnpm install
pnpm install
Development build (logging active, source maps)
开发构建(开启日志、生成Source Map)
pnpm build
pnpm build
Production build (errors only, minified)
生产构建(仅输出错误、代码压缩)
pnpm build:prod
Load the extension:
1. Open `chrome://extensions`
2. Enable **Developer mode**
3. Click **Load unpacked** → select `.output/chrome-mv3-dev/`
**Model download** happens automatically on first chat open:
- `onnx-community/gemma-4-E2B-it-ONNX` — ~500 MB (default)
- `onnx-community/gemma-4-E4B-it-ONNX` — ~1.5 GB
Models are cached in the browser's cache storage after the first run.pnpm build:prod
加载扩展:
1. 打开`chrome://extensions`
2. 开启**开发者模式**
3. 点击**加载已解压的扩展程序** → 选择`.output/chrome-mv3-dev/`
**模型下载**会在首次打开聊天窗口时自动执行:
- `onnx-community/gemma-4-E2B-it-ONNX` — ~500 MB(默认)
- `onnx-community/gemma-4-E4B-it-ONNX` — ~1.5 GB
首次运行后,模型会被缓存到浏览器的缓存存储中。Key Interfaces (agent/
)
agent/核心接口 (agent/
)
agent/ModelBackend
ModelBackend
typescript
// agent/types.ts
export interface ModelBackend {
generate(
messages: ChatMessage[],
tools: ToolDefinition[],
options: GenerateOptions
): AsyncGenerator<StreamChunk>;
}
export interface ToolDefinition {
name: string;
description: string;
parameters: JSONSchema;
}
export interface GenerateOptions {
maxNewTokens?: number;
thinking?: boolean;
}typescript
// agent/types.ts
export interface ModelBackend {
generate(
messages: ChatMessage[],
tools: ToolDefinition[],
options: GenerateOptions
): AsyncGenerator<StreamChunk>;
}
export interface ToolDefinition {
name: string;
description: string;
parameters: JSONSchema;
}
export interface GenerateOptions {
maxNewTokens?: number;
thinking?: boolean;
}ToolExecutor
ToolExecutor
typescript
// agent/types.ts
export interface ToolExecutor {
execute(toolName: string, args: Record<string, unknown>): Promise<unknown>;
}typescript
// agent/types.ts
export interface ToolExecutor {
execute(toolName: string, args: Record<string, unknown>): Promise<unknown>;
}Agent Loop
Agent Loop
typescript
// agent/loop.ts — simplified illustration
export async function* runAgentLoop(
userMessage: string,
history: ChatMessage[],
model: ModelBackend,
tools: ToolExecutor,
toolDefs: ToolDefinition[],
maxIterations: number
): AsyncGenerator<AgentEvent> {
const messages = [...history, { role: "user", content: userMessage }];
for (let i = 0; i < maxIterations; i++) {
for await (const chunk of model.generate(messages, toolDefs, {})) {
if (chunk.type === "token") yield { type: "token", token: chunk.token };
if (chunk.type === "tool_call") {
yield { type: "tool_start", name: chunk.name };
const result = await tools.execute(chunk.name, chunk.args);
yield { type: "tool_result", name: chunk.name, result };
messages.push({ role: "tool", name: chunk.name, content: String(result) });
}
if (chunk.type === "done") return;
}
}
}typescript
// agent/loop.ts — 简化示例
export async function* runAgentLoop(
userMessage: string,
history: ChatMessage[],
model: ModelBackend,
tools: ToolExecutor,
toolDefs: ToolDefinition[],
maxIterations: number
): AsyncGenerator<AgentEvent> {
const messages = [...history, { role: "user", content: userMessage }];
for (let i = 0; i < maxIterations; i++) {
for await (const chunk of model.generate(messages, toolDefs, {})) {
if (chunk.type === "token") yield { type: "token", token: chunk.token };
if (chunk.type === "tool_call") {
yield { type: "tool_start", name: chunk.name };
const result = await tools.execute(chunk.name, chunk.args);
yield { type: "tool_result", name: chunk.name, result };
messages.push({ role: "tool", name: chunk.name, content: String(result) });
}
if (chunk.type === "done") return;
}
}
}Built-in Tools
内置工具
| Tool | Location | Description |
|---|---|---|
| Content script | Read page text/HTML or a CSS selector |
| Service worker | Capture visible tab as PNG |
| Content script | Click by CSS selector |
| Content script | Type into input by CSS selector |
| Content script | Scroll by pixel amount |
| Service worker | Execute JS in page context |
| 工具 | 位置 | 描述 |
|---|---|---|
| Content script | 读取页面文本/HTML内容,或指定CSS选择器的内容 |
| Service worker | 将当前可见标签页捕获为PNG |
| Content script | 点击指定CSS选择器的元素 |
| Content script | 在指定CSS选择器的输入框中输入内容 |
| Content script | 按指定像素数滚动页面 |
| Service worker | 在页面上下文执行JS代码 |
Adding a New Tool
添加新工具
Tools live in two places: the definition (in the offscreen agent) and the executor (in content script or service worker).
工具分为两部分:定义(在离屏Agent中)和执行器(在Content script或Service worker中)。
Step 1 — Define the tool schema
步骤1 — 定义工具schema
typescript
// offscreen/tools/definitions.ts
export const MY_TOOL_DEFINITION: ToolDefinition = {
name: "get_page_title",
description: "Returns the document title of the current page.",
parameters: {
type: "object",
properties: {},
required: [],
},
};typescript
// offscreen/tools/definitions.ts
export const MY_TOOL_DEFINITION: ToolDefinition = {
name: "get_page_title",
description: "Returns the document title of the current page.",
parameters: {
type: "object",
properties: {},
required: [],
},
};Step 2 — Register in the tool list
步骤2 — 在工具列表中注册
typescript
// offscreen/tools/index.ts
import { MY_TOOL_DEFINITION } from "./definitions";
export const ALL_TOOLS: ToolDefinition[] = [
// ...existing tools
MY_TOOL_DEFINITION,
];typescript
// offscreen/tools/index.ts
import { MY_TOOL_DEFINITION } from "./definitions";
export const ALL_TOOLS: ToolDefinition[] = [
// ...现有工具
MY_TOOL_DEFINITION,
];Step 3 — Implement execution in the content script
步骤3 — 在Content script中实现执行逻辑
typescript
// content/tools/executor.ts
export async function executeContentTool(
name: string,
args: Record<string, unknown>
): Promise<unknown> {
switch (name) {
case "get_page_title":
return document.title;
case "read_page_content": {
const selector = args.selector as string | undefined;
if (selector) {
return document.querySelector(selector)?.textContent ?? "Not found";
}
return document.body.innerText;
}
case "click_element": {
const el = document.querySelector(args.selector as string) as HTMLElement;
if (!el) throw new Error(`Element not found: ${args.selector}`);
el.click();
return "clicked";
}
case "type_text": {
const input = document.querySelector(args.selector as string) as HTMLInputElement;
if (!input) throw new Error(`Input not found: ${args.selector}`);
input.focus();
input.value = args.text as string;
input.dispatchEvent(new Event("input", { bubbles: true }));
input.dispatchEvent(new Event("change", { bubbles: true }));
return "typed";
}
default:
throw new Error(`Unknown content tool: ${name}`);
}
}typescript
// content/tools/executor.ts
export async function executeContentTool(
name: string,
args: Record<string, unknown>
): Promise<unknown> {
switch (name) {
case "get_page_title":
return document.title;
case "read_page_content": {
const selector = args.selector as string | undefined;
if (selector) {
return document.querySelector(selector)?.textContent ?? "Not found";
}
return document.body.innerText;
}
case "click_element": {
const el = document.querySelector(args.selector as string) as HTMLElement;
if (!el) throw new Error(`Element not found: ${args.selector}`);
el.click();
return "clicked";
}
case "type_text": {
const input = document.querySelector(args.selector as string) as HTMLInputElement;
if (!input) throw new Error(`Input not found: ${args.selector}`);
input.focus();
input.value = args.text as string;
input.dispatchEvent(new Event("input", { bubbles: true }));
input.dispatchEvent(new Event("change", { bubbles: true }));
return "typed";
}
default:
throw new Error(`Unknown content tool: ${name}`);
}
}Step 4 — Handle service-worker-side tools
步骤4 — 处理Service worker侧的工具
typescript
// background/tools.ts
export async function executeSwTool(
name: string,
args: Record<string, unknown>,
tabId: number
): Promise<unknown> {
switch (name) {
case "take_screenshot": {
const dataUrl = await chrome.tabs.captureVisibleTab({ format: "png" });
return dataUrl;
}
case "run_javascript": {
const results = await chrome.scripting.executeScript({
target: { tabId },
func: new Function(args.code as string) as () => unknown,
});
return results[0]?.result ?? null;
}
default:
return null; // not a SW tool — forward to content script
}
}typescript
// background/tools.ts
export async function executeSwTool(
name: string,
args: Record<string, unknown>,
tabId: number
): Promise<unknown> {
switch (name) {
case "take_screenshot": {
const dataUrl = await chrome.tabs.captureVisibleTab({ format: "png" });
return dataUrl;
}
case "run_javascript": {
const results = await chrome.scripting.executeScript({
target: { tabId },
func: new Function(args.code as string) as () => unknown,
});
return results[0]?.result ?? null;
}
default:
return null; // 非SW工具,转发到Content script
}
}Message Routing Pattern
消息路由模式
The service worker acts as a message bus. All communication uses .
chrome.runtime.sendMessagetypescript
// Message types (shared/messages.ts)
export type ExtMessage =
| { type: "TOOL_CALL"; name: string; args: Record<string, unknown>; tabId: number }
| { type: "TOOL_RESULT"; name: string; result: unknown }
| { type: "TOKEN"; token: string }
| { type: "AGENT_DONE" }
| { type: "AGENT_ERROR"; error: string };
// Offscreen → SW
chrome.runtime.sendMessage<ExtMessage>({
type: "TOOL_CALL",
name: "click_element",
args: { selector: "#submit-btn" },
tabId: currentTabId,
});
// SW → Content script
chrome.tabs.sendMessage<ExtMessage>(tabId, {
type: "TOOL_CALL",
name: "click_element",
args: { selector: "#submit-btn" },
tabId,
});Service worker充当消息总线,所有通信都使用。
chrome.runtime.sendMessagetypescript
// 消息类型 (shared/messages.ts)
export type ExtMessage =
| { type: "TOOL_CALL"; name: string; args: Record<string, unknown>; tabId: number }
| { type: "TOOL_RESULT"; name: string; result: unknown }
| { type: "TOKEN"; token: string }
| { type: "AGENT_DONE" }
| { type: "AGENT_ERROR"; error: string };
// 离屏文档 → SW
chrome.runtime.sendMessage<ExtMessage>({
type: "TOOL_CALL",
name: "click_element",
args: { selector: "#submit-btn" },
tabId: currentTabId,
});
// SW → Content script
chrome.tabs.sendMessage<ExtMessage>(tabId, {
type: "TOOL_CALL",
name: "click_element",
args: { selector: "#submit-btn" },
tabId,
});Model Configuration
模型配置
typescript
// offscreen/model.ts — loading with transformers.js
import { pipeline, TextGenerationPipeline } from "@huggingface/transformers";
const MODEL_IDS = {
E2B: "onnx-community/gemma-4-E2B-it-ONNX",
E4B: "onnx-community/gemma-4-E4B-it-ONNX",
} as const;
export type ModelSize = keyof typeof MODEL_IDS;
export async function loadModel(
size: ModelSize,
onProgress: (progress: number) => void
): Promise<TextGenerationPipeline> {
return pipeline("text-generation", MODEL_IDS[size], {
dtype: "q4f16",
device: "webgpu",
progress_callback: (p: { progress: number }) => onProgress(p.progress),
});
}typescript
// offscreen/model.ts — 使用transformers.js加载
import { pipeline, TextGenerationPipeline } from "@huggingface/transformers";
const MODEL_IDS = {
E2B: "onnx-community/gemma-4-E2B-it-ONNX",
E4B: "onnx-community/gemma-4-E4B-it-ONNX",
} as const;
export type ModelSize = keyof typeof MODEL_IDS;
export async function loadModel(
size: ModelSize,
onProgress: (progress: number) => void
): Promise<TextGenerationPipeline> {
return pipeline("text-generation", MODEL_IDS[size], {
dtype: "q4f16",
device: "webgpu",
progress_callback: (p: { progress: number }) => onProgress(p.progress),
});
}Settings & Persistence
设置与持久化
Settings are stored via :
chrome.storage.synctypescript
export interface GemmaGemSettings {
modelSize: "E2B" | "E4B";
thinking: boolean;
maxIterations: number;
disabledHosts: string[];
}
const DEFAULT_SETTINGS: GemmaGemSettings = {
modelSize: "E2B",
thinking: false,
maxIterations: 10,
disabledHosts: [],
};
export async function getSettings(): Promise<GemmaGemSettings> {
const stored = await chrome.storage.sync.get("settings");
return { ...DEFAULT_SETTINGS, ...(stored.settings ?? {}) };
}
export async function saveSettings(patch: Partial<GemmaGemSettings>): Promise<void> {
const current = await getSettings();
await chrome.storage.sync.set({ settings: { ...current, ...patch } });
}
// Disable extension on current host
async function disableOnCurrentSite() {
const host = new URL(location.href).hostname;
const settings = await getSettings();
if (!settings.disabledHosts.includes(host)) {
await saveSettings({ disabledHosts: [...settings.disabledHosts, host] });
}
}设置通过存储:
chrome.storage.synctypescript
export interface GemmaGemSettings {
modelSize: "E2B" | "E4B";
thinking: boolean;
maxIterations: number;
disabledHosts: string[];
}
const DEFAULT_SETTINGS: GemmaGemSettings = {
modelSize: "E2B",
thinking: false,
maxIterations: 10,
disabledHosts: [],
};
export async function getSettings(): Promise<GemmaGemSettings> {
const stored = await chrome.storage.sync.get("settings");
return { ...DEFAULT_SETTINGS, ...(stored.settings ?? {}) };
}
export async function saveSettings(patch: Partial<GemmaGemSettings>): Promise<void> {
const current = await getSettings();
await chrome.storage.sync.set({ settings: { ...current, ...patch } });
}
// 在当前站点禁用扩展
async function disableOnCurrentSite() {
const host = new URL(location.href).hostname;
const settings = await getSettings();
if (!settings.disabledHosts.includes(host)) {
await saveSettings({ disabledHosts: [...settings.disabledHosts, host] });
}
}Shadow DOM Chat UI Pattern
Shadow DOM聊天UI模式
The content script injects a shadow DOM to isolate styles:
typescript
// content/ui.ts
export function injectChatOverlay(): ShadowRoot {
const host = document.createElement("div");
host.id = "gemma-gem-host";
// Prevent page styles from leaking in
const shadow = host.attachShadow({ mode: "closed" });
// Inject styles
const style = document.createElement("style");
style.textContent = CHAT_STYLES; // imported CSS string
shadow.appendChild(style);
// Inject chat container
const container = document.createElement("div");
container.id = "gemma-gem-container";
shadow.appendChild(container);
document.body.appendChild(host);
return shadow;
}Content script会注入shadow DOM来隔离样式:
typescript
// content/ui.ts
export function injectChatOverlay(): ShadowRoot {
const host = document.createElement("div");
host.id = "gemma-gem-host";
// 防止页面样式渗透
const shadow = host.attachShadow({ mode: "closed" });
// 注入样式
const style = document.createElement("style");
style.textContent = CHAT_STYLES; // 导入的CSS字符串
shadow.appendChild(style);
// 注入聊天容器
const container = document.createElement("div");
container.id = "gemma-gem-container";
shadow.appendChild(container);
document.body.appendChild(host);
return shadow;
}Debugging
调试
All logs use prefix. Development builds log info/debug/warn; production only logs errors.
[Gemma Gem]undefined所有日志都带有前缀。开发构建版本会输出info/debug/warn级别的日志;生产版本仅输出错误日志。
[Gemma Gem]undefinedService worker logs
Service worker日志
chrome://extensions → Gemma Gem → "Inspect views: service worker"
chrome://extensions → Gemma Gem → "检查视图: service worker"
Offscreen document (most useful: model loading, prompts, tool calls)
离屏文档(最实用:模型加载、Prompt、工具调用)
chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"
chrome://extensions → Gemma Gem → "检查视图: offscreen.html"
Content script logs
Content script日志
DevTools on any page → Console (filter: [Gemma Gem])
任意页面的DevTools → 控制台(过滤: [Gemma Gem])
All extension contexts
所有扩展上下文
chrome://inspect#other
Key things to check in offscreen document logs:
- Model download progress
- Full prompt construction
- Token counts per turn
- Raw model output (before tool call parsing)
- Tool execution resultschrome://inspect#other
在离屏文档日志中可检查的核心内容:
- 模型下载进度
- 完整Prompt构建过程
- 每轮的token数量
- 原始模型输出(工具调用解析前)
- 工具执行结果Common Patterns & Gotchas
常用模式与注意事项
WebGPU availability check:
typescript
if (!navigator.gpu) {
throw new Error("WebGPU not supported. Use Chrome 113+ with hardware acceleration enabled.");
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) throw new Error("No WebGPU adapter found.");Offscreen document lifecycle — Chrome may suspend the offscreen document. Ping it before sending messages:
typescript
async function ensureOffscreen() {
const existing = await chrome.offscreen.hasDocument();
if (!existing) {
await chrome.offscreen.createDocument({
url: "offscreen.html",
reasons: [chrome.offscreen.Reason.WORKERS],
justification: "Run Gemma 4 inference via WebGPU",
});
}
}Context window management — Gemma 4 supports 128K tokens but inference slows with long contexts. Clear history per-page with or limit stored turns:
clear_contexttypescript
const MAX_HISTORY_TURNS = 20;
function trimHistory(messages: ChatMessage[]): ChatMessage[] {
if (messages.length <= MAX_HISTORY_TURNS * 2) return messages;
return messages.slice(-MAX_HISTORY_TURNS * 2);
}Tool call parsing — Gemma 4 emits tool calls in a structured format. If adding custom parsing, guard against partial/streamed JSON:
typescript
function safeParseToolCall(raw: string): { name: string; args: Record<string, unknown> } | null {
try {
return JSON.parse(raw);
} catch {
return null; // still streaming
}
}CSS selector safety for DOM tools:
typescript
function safeQuerySelector(selector: string): Element | null {
try {
return document.querySelector(selector);
} catch {
return null; // invalid selector from model
}
}WebGPU可用性检查:
typescript
if (!navigator.gpu) {
throw new Error("WebGPU not supported. Use Chrome 113+ with hardware acceleration enabled.");
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) throw new Error("No WebGPU adapter found.");离屏文档生命周期 — Chrome可能会暂停离屏文档,发送消息前先ping确认存活:
typescript
async function ensureOffscreen() {
const existing = await chrome.offscreen.hasDocument();
if (!existing) {
await chrome.offscreen.createDocument({
url: "offscreen.html",
reasons: [chrome.offscreen.Reason.WORKERS],
justification: "Run Gemma 4 inference via WebGPU",
});
}
}上下文窗口管理 — Gemma 4支持128K token,但上下文越长推理速度越慢。可通过按页面清空历史,或限制存储的对话轮数:
clear_contexttypescript
const MAX_HISTORY_TURNS = 20;
function trimHistory(messages: ChatMessage[]): ChatMessage[] {
if (messages.length <= MAX_HISTORY_TURNS * 2) return messages;
return messages.slice(-MAX_HISTORY_TURNS * 2);
}工具调用解析 — Gemma 4会输出结构化格式的工具调用。如果添加自定义解析逻辑,需要防范不完整/流式传输的JSON:
typescript
function safeParseToolCall(raw: string): { name: string; args: Record<string, unknown> } | null {
try {
return JSON.parse(raw);
} catch {
return null; // 仍在流式传输
}
}DOM工具的CSS选择器安全:
typescript
function safeQuerySelector(selector: string): Element | null {
try {
return document.querySelector(selector);
} catch {
return null; // 模型输出了无效的选择器
}
}