Project Nirvana

Privacy-First Local-First Inference for AI Agents

"Privacy is not something that I'm merely entitled to, but something I deeply believe in."
— Bruce Schneier

The Problem

Default OpenClaw transmits 2,000–5,000 tokens per query to cloud APIs. Your identity files, conversation history, memory, and sensitive context get sent unencrypted over the wire to third-party inference providers every single turn.

Even with a well-intentioned provider, this creates a massive attack surface:

Data breaches: One API compromise exposes all your sessions
Lateral movement: Your identity data is now a target for sophisticated adversaries
Inference extraction: Attackers can learn your agent's personality, habits, and decision patterns
Compliance risk: HIPAA, GDPR, SOX — if you're handling regulated data, you're violating the law by default

The Solution: Local-First with Smart Fallback

Project Nirvana keeps 80%+ of computation local on your hardware, using cloud APIs only as a stateless fallback for frontier reasoning tasks.

When fallback is necessary, the context-stripper removes:

Identity files (SOUL.md, USER.md, AGENTS.md)
Sensitive memory (personal notes, financial data, credentials)
Session history (conversation context)
Custom instructions and personality

Result: 85% token reduction + 100% privacy preservation.

What You Get

Plugin (v1.0.0)

Bundles Ollama + qwen2.5:7b with intelligent routing:

Runs local inference by default (CPU or GPU)
Automatically falls back to Claude Haiku for complex tasks
Strips identity before sending to cloud
Works offline — cloud APIs optional, not required
Drop-in replacement for default OpenClaw

Installation:

openclaw plugins install ShivaClaw/nirvana

Skill (Lightweight Context-Stripper)

If you already have a local LLM running (Ollama, vLLM, local Claude.cpp), install the skill for context-stripping only:

Removes identity before cloud fallback
Minimal overhead (~50ms per query)
Works with any OpenClaw-compatible LLM

Architecture: Local-First Router

Tier 1 (Default): Ollama on local hardware (qwen2.5:7b, ~4 tokens/sec)

Tier 2 (Fallback): Cloud APIs with context stripping:

Claude Haiku 4.5 (high-stakes reasoning)
Gemini 2.5 Flash (redundancy)
Grok 3 Mini (speed)

Decision Logic:

First attempt: Local Ollama (always)
If local timeout or quality threshold not met: Strip identity, try cloud fallback
If cloud unavailable: Serve stale response from local cache
Offline mode: Pure local, no cloud calls at all

Real-World Impact

Before Nirvana: 50+ API calls/day to OpenAI, each carrying full identity context. ~$50/month in API costs. Full session data exposed.

After Nirvana: 80%+ queries handled locally. 10–15 API calls/day (fallback only). ~$5/month in API costs. Zero identity data transmitted.

Hardware Requirement: 8GB RAM minimum (Ollama + qwen2.5:7b uses ~5GB). CPU-only inference at 3–4 tokens/sec. GPU optional but recommended for 20+ tok/sec.

Why This Matters

The premise of Project Trident is that agents should become your partners over time. They learn your preferences, your goals, your personality. That data is extremely valuable — to attackers, to competitors, to advertisers.

You should own that data. Not OpenAI. Not Google. Not Anthropic.

Nirvana is our answer: keep computation local by default, use cloud APIs only when you need their raw capability, and never send your identity to third parties.

Get Started

Project Nirvana is open-source and production-ready.

GitHub Repository ClawHub Registry