Project Nirvana
Privacy-First Local-First Inference for AI Agents
"Privacy is not something that I'm merely entitled to, but something I deeply believe in."
— Bruce Schneier
The Problem
Default OpenClaw transmits 2,000–5,000 tokens per query to cloud APIs. Your identity files, conversation history, memory, and sensitive context get sent unencrypted over the wire to third-party inference providers every single turn.
Even with a well-intentioned provider, this creates a massive attack surface:
- Data breaches: One API compromise exposes all your sessions
- Lateral movement: Your identity data is now a target for sophisticated adversaries
- Inference extraction: Attackers can learn your agent's personality, habits, and decision patterns
- Compliance risk: HIPAA, GDPR, SOX — if you're handling regulated data, you're violating the law by default
The Solution: Local-First with Smart Fallback
Project Nirvana keeps 80%+ of computation local on your hardware, using cloud APIs only as a stateless fallback for frontier reasoning tasks.
When fallback is necessary, the context-stripper removes:
- Identity files (SOUL.md, USER.md, AGENTS.md)
- Sensitive memory (personal notes, financial data, credentials)
- Session history (conversation context)
- Custom instructions and personality
Result: 85% token reduction + 100% privacy preservation.
What You Get
Plugin (v1.0.0)
Bundles Ollama + qwen2.5:7b with intelligent routing:
- Runs local inference by default (CPU or GPU)
- Automatically falls back to Claude Haiku for complex tasks
- Strips identity before sending to cloud
- Works offline — cloud APIs optional, not required
- Drop-in replacement for default OpenClaw
Installation:
openclaw plugins install ShivaClaw/nirvana
Skill (Lightweight Context-Stripper)
If you already have a local LLM running (Ollama, vLLM, local Claude.cpp), install the skill for context-stripping only:
- Removes identity before cloud fallback
- Minimal overhead (~50ms per query)
- Works with any OpenClaw-compatible LLM
Architecture: Local-First Router
Tier 1 (Default): Ollama on local hardware (qwen2.5:7b, ~4 tokens/sec)
Tier 2 (Fallback): Cloud APIs with context stripping:
- Claude Haiku 4.5 (high-stakes reasoning)
- Gemini 2.5 Flash (redundancy)
- Grok 3 Mini (speed)
Decision Logic:
- First attempt: Local Ollama (always)
- If local timeout or quality threshold not met: Strip identity, try cloud fallback
- If cloud unavailable: Serve stale response from local cache
- Offline mode: Pure local, no cloud calls at all
Real-World Impact
Before Nirvana: 50+ API calls/day to OpenAI, each carrying full identity context. ~$50/month in API costs. Full session data exposed.
After Nirvana: 80%+ queries handled locally. 10–15 API calls/day (fallback only). ~$5/month in API costs. Zero identity data transmitted.
Hardware Requirement: 8GB RAM minimum (Ollama + qwen2.5:7b uses ~5GB). CPU-only inference at 3–4 tokens/sec. GPU optional but recommended for 20+ tok/sec.
Why This Matters
The premise of Project Trident is that agents should become your partners over time. They learn your preferences, your goals, your personality. That data is extremely valuable — to attackers, to competitors, to advertisers.
You should own that data. Not OpenAI. Not Google. Not Anthropic.
Nirvana is our answer: keep computation local by default, use cloud APIs only when you need their raw capability, and never send your identity to third parties.
Get Started
Project Nirvana is open-source and production-ready.