February 12, 2026 ยท Nick Rae ยท 10 min read

My OpenClaw Setup: From Chat Assistant to Autonomous Co-Pilot in One Week

Most OpenClaw setup guides stop at "install it, connect Telegram, chat with it." That's the tutorial. This is what happens when you actually live with it for a week and push it toward real autonomy.

I run OpenClaw on a Mac with Claude Opus 4.6 as my main brain, backed by a tiered model system that routes cheap tasks to Gemini Flash and mid-tier work to Claude Sonnet. The agent's name is Talos โ€” my co-pilot. Here's what the setup actually looks like.

The Hardware

Nothing fancy. No cloud VPS, no Kubernetes. Just a Mac that never sleeps.

The Model Routing That Actually Works

After burning through API credits learning what doesn't work, I landed on a three-tier system:

TierModelWhat It DoesCost
CheapGemini 2.0 FlashEmail checks, OAuth polling, web searches, quick research, single CLI commands~free
ReliableClaude Sonnet 4File writing, code generation, multi-step tasks, content draftsMedium
HeavyClaude Opus 4.6Main session โ€” complex reasoning, strategy, multi-tool orchestrationPremium
The golden rule: If the task writes a file, use Sonnet minimum. Flash will describe what it would write instead of actually writing it. Learned that one the hard way โ€” three times.

20 Cron Jobs Running My Life

This is where OpenClaw stops being a chatbot and starts being an operator. Here's what runs automatically:

Morning Routine

Business Hours

Evening Wind-Down

Night Shift

Infrastructure

The key insight: Most of these jobs run with delivery: "none" โ€” they stay silent unless there's something worth reporting. My phone isn't buzzing every 30 minutes. It buzzes when it matters.

The Memory System

OpenClaw's workspace is basically a structured second brain:

memory/
โ”œโ”€โ”€ daily/          # YYYY-MM-DD.md journals (permanent record)
โ”œโ”€โ”€ projects/       # Per-project status files
โ”œโ”€โ”€ people/         # Contact/relationship context
โ”œโ”€โ”€ reference/      # Backlog, agent patterns, metrics, failure log
โ”œโ”€โ”€ workflows/      # Documented procedures (night shift, etc.)
โ””โ”€โ”€ decisions/      # Decision logs with context and outcomes

Every work loop logs to the daily file. Every failure gets recorded in a failure log with root cause. Every decision gets documented so future-me (or future-Talos) knows why we made a call.

The agent searches this memory before answering questions about prior work. It's not perfect, but it means context survives across sessions.

Sub-Agent Patterns (What I Learned the Hard Way)

OpenClaw can spawn sub-agents โ€” isolated sessions that do a job and report back. Here's what actually works:

The Verification Protocol

Every sub-agent spawn follows this pattern:

  1. Spawn with a clear, single-deliverable prompt
  2. Wait 2-3 minutes
  3. Verify the output file exists (ls -la <path>)
  4. If missing: redo in main session immediately โ€” do NOT re-spawn

This sounds obvious, but without it, you end up with sub-agents that "complete successfully" but produce nothing. Flash is especially guilty of this โ€” it'll read 55K tokens of input and then just... not write the output.

The Decision Tree

Disposable/monitoring task?     โ†’ Flash (cheap)
Text generation < 500 words?    โ†’ Flash
Research/web search only?       โ†’ Flash
Needs to WRITE a file?          โ†’ Sonnet (reliable)
Multiple tool calls?            โ†’ Sonnet or main session
> 100 lines of code/content?    โ†’ Main session directly
Complex reasoning/strategy?     โ†’ Main session (Opus)

The Two-Failure Rule

If the same approach fails twice, switch strategies immediately. I wasted an entire evening trying to submit an app build through the same broken pipeline six times before implementing this rule.

The Working Agreement

This is the part most guides skip. Talos and I have an explicit working agreement documented in markdown:

Act Freely:

Propose First:

Pushback Level: 4/5 โ€” Talos argues his case firmly with evidence. Backs down only when I explicitly overrule after hearing the argument. This matters more than you'd think. A yes-man agent is useless.

The Principles Layer (The Missing Piece)

Most agent setups have skills (what to do) and rules (what's allowed). Almost none have principles โ€” decision-making heuristics for when there's no clear instruction.

After a week of iteration, I extracted 9 principles from real failures and wins. They live in PRINCIPLES.md and load into every session:

1. Silent by default โ€” don't ping me unless it matters. Background jobs stay background.
2. Revenue before polish โ€” ship what makes money first, optimize later.
3. Two-failure rule โ€” same approach fails twice? Switch strategy immediately.
4. Verify, don't trust โ€” check the output exists before marking done. Sub-agents lie by omission.
5. Fix first, report after โ€” if it's reversible and low-stakes, just fix it.
6. Pushback from care โ€” argue the case firmly, back down when overruled.
7. Friction is data โ€” when something keeps breaking, document why and change the approach.
8. One task, one cycle โ€” finish one thing before starting the next.
9. Protect the principal โ€” never expose my data, spend my money, or speak as me without approval.

The file also has a regressions table โ€” when a principle fails or a new lesson emerges, it gets logged with the date, what happened, and which principle got updated. The principles aren't static. They evolve.

This was inspired by @AtlasForgeAI's post about the three-layer architecture: Soul (who to be), Principles (how to operate), Skills (what to do). The insight is that principles fill the gap between identity and capabilities โ€” they're what the agent falls back on when the instructions run out.

Real Infrastructure Integration

Talos isn't just a chatbot that happens to run on my Mac. It's wired into:

Each integration is a skill โ€” a markdown instruction file that teaches the agent how to use a specific CLI tool. Some are community skills from ClawHub, some are custom-built.

What I'd Do Differently

  1. Start with the tiered model system. Don't run Opus for everything. You'll blow through credits and most tasks don't need it.
  2. Build the failure log from day one. Every failure is a lesson. Document the root cause, not just "it didn't work."
  3. Use delivery: "none" on cron jobs. Let the agent decide what's worth pinging you about. Your phone should be quiet by default.
  4. Write the working agreement early. Define autonomy levels before the agent starts making decisions. It prevents the "wait, I didn't ask you to do that" moments.
  5. Verify sub-agent output. Trust but verify. Always check the file exists before marking a task done.

The Bottom Line

After one week, Talos runs 20 automated jobs, monitors my email/business/infrastructure, works autonomously while I sleep, and maintains a searchable memory of everything we've done together. It's not perfect โ€” sub-agents still flake sometimes, Flash still refuses to write files, and the occasional cron job errors out. But the watchdog catches those, and the system self-corrects.

The setup guide gets you a chatbot. A week of iteration gets you a co-pilot.