A private AI research agent running Gemma entirely in your browser via WebGPU. Tool calling, persistent memory, and web search — all on-device.
Only your search queries touch the network (and only when you choose to). All reasoning stays on your device.
Models are cached after first download — future visits load instantly.
Requirements: Chrome 113+, Edge 113+, or Firefox 130+ with WebGPU.
Gemma 4 can call SAM (Segment Anything Model) to segment objects in attached images. Choose your SAM model in Settings — loaded on first use.
Translation works directly — Gemma 4 speaks 140+ languages natively, no tool needed.
Gemma 3 1B works as a simple chatbot — no agent tools. Prefer Ternary Bonsai 1.7B if you want tool calling at a similar size.
Documents are chunked, embedded, and stored as searchable knowledge. A summary is generated on upload.
name, description, parameters, endpoint). On a tool call the model's args are POSTed to your endpoint as a JSON body; the response is fed back to the model. CORS must allow this origin.tools/list, and registers each with an mcp_ prefix so the agent loop can use them alongside built-ins.$\int x^2 dx$ and display $$\\sum_{i=1}^n i$$ math render via KaTeX; ```mermaid blocks render as SVG via lazy-loaded Mermaid.```html / ```svg / ```artifact code blocks get a live sandboxed iframe below the code (sandbox="allow-scripts", no same-origin). Safe to run model-generated UI inline.run_python to execute Python in a sandboxed Pyodide worker. numpy / pandas / matplotlib auto-install on import. ~10 MB first-use download.Settings → tick Expose window.localmind. Same-tab only — cross-origin scripts cannot reach it. The object is frozen and non-writable; disable the toggle to detach.
loaded flagchat.completionchat.completion.chunk objectsEvery API call is logged in-memory (last 50). Click the • API chip in the toolbar or Settings → View activity log. Each call shows method, prompt length, tokens generated, duration, and outcome (ok / err / busy).
Open demo.html in the same folder. It iframes LocalMind, auto-flips the toggle, waits for the model, and runs both a non-streaming and a streaming completion against iframe.contentWindow.localmind.
Click the 🗣 button to the left of the input, speak a sentence, click again to stop. Whisper runs on-device. Try saying: “Write a short summary of the Great Pyramid of Giza in three sentences.”
```mermaid block renders as plain code, the model stripped the language label. Re-send with “Preserve the language tag after the backticks”.run_python, MCP tools, and multi-step planning require an agent-capable (Gemma 4) model.Gemma runs entirely on your device. Nothing is sent to any server.
Switch to a Gemma 4 model for tool calling, memory, and multimodal input.
Web search requires an API key (Tavily, Brave) or a self-hosted SearXNG instance.
onnx/. Multimodal
custom models are not yet supported. Added models appear in the model selector.
POST <endpoint> with the model-generated
args as JSON body; the response JSON is fed back to the model. The endpoint must send
CORS headers for this origin. Name must be [a-zA-Z_][a-zA-Z0-9_]* and not
collide with a built-in.
mcp_
to avoid collisions with built-ins. The server must allow CORS for this origin and
accept JSON-RPC 2.0 requests at the given URL. Connections re-established on page load.