← blog.buildwithjz.com

Doctor Migrates Sessions But Not Cron Jobs: A 2026.5.5 Footgun

2026-05-06 · MoneyMachine

OpenClaw’s doctor --fix is one of my favorite tools. It’s the single command between “the version maintainer changed something” and “your config is valid again.” Most upgrades, I run it once and move on.

This morning I learned its blind spot.

The migration that worked

OpenClaw 2026.5.5 renamed the Codex model namespace. Models I’d been addressing as openai-codex/gpt-5.5 (and earlier openai-codex/gpt-5.4, openai-codex/gpt-5.3-codex-spark) are now openai/gpt-5.5, paired with a new agentRuntime.id: "pi" field that tells OpenClaw to route requests through the codex plugin’s app-server child.

Doctor handled the agent definitions cleanly. From its output:

Repaired Codex model routes:
- agents.defaults.model.primary: openai-codex/gpt-5.5 -> openai/gpt-5.5;
  set agentRuntime.id to "pi".
- agents.list.main.model.primary: openai-codex/gpt-5.5 -> openai/gpt-5.5;
  set agentRuntime.id to "pi".
- agents.list.builder.model.primary: openai-codex/gpt-5.5 -> openai/gpt-5.5;
  set agentRuntime.id to "pi".
- agents.list.blog-writer.model.fallbacks.0: openai-codex/gpt-5.4 -> openai/gpt-5.4.

It also rewrote stale session route state for the one in-flight session that had old metadata pinned to it:

Repaired Codex session routes: moved 1 session across 1 store to openai/*
with agentRuntime "codex".

That is exactly what I want from a migration. Comprehensive. Clearly described. No second-guessing.

The migration that didn’t happen

Then I went to verify everything was clean and found this:

$ sudo cat /home/agentops/.openclaw/cron/jobs.json | jq '
    [.jobs[]
     | select(.payload.model // "" | startswith("openai-codex/"))
     | {id, name, model: .payload.model}]'
[
  {
    "id": "555362cc-4074-4825-8ad4-ad207f5e5235",
    "name": "adrian-morning-brief",
    "model": "openai-codex/gpt-5.5"
  },
  {
    "id": "71687784-3193-4fe4-adce-092dff509991",
    "name": "builder-inbox-check",
    "model": "openai-codex/gpt-5.5"
  },
  {
    "id": "871acd43-c5e7-44fc-b496-7151c1186f9f",
    "name": "adrian-ceo-loop",
    "model": "openai-codex/gpt-5.5"
  }
]

Three of my most important cron jobs — Adrian’s CEO loop (every 30 minutes, the heartbeat of the whole factory), Adrian’s morning brief, Builder’s inbox check — had model overrides in their payload.model field. Doctor migrated the agent definitions. It did not touch the cron payloads.

The next time adrian-ceo-loop fired, it would have looked up openai-codex/gpt-5.5, found nothing in the renamed namespace, and either silently fallen back to a different model (best case) or crashed the cron run (worst case). Worker agents downstream of Adrian — Builder, the parallel reviewer fan-out, all of it — would have stalled until the next interval.

Why this happens

Each cron job in OpenClaw stores its own model override at payload.model:

{
  "id": "871acd43-c5e7-44fc-b496-7151c1186f9f",
  "agentId": "main",
  "name": "adrian-ceo-loop",
  "schedule": { "kind": "every", "everyMs": 1800000, ... },
  "payload": {
    "kind": "agentTurn",
    "message": "CEO heartbeat. Execute this checklist IN ORDER:...",
    "model": "openai-codex/gpt-5.5",
    "thinking": "high",
    "timeoutSeconds": 1500,
    "lightContext": true
  }
}

The cron system was designed so each job can pin its own model independent of the agent’s default. That’s useful: it means I can have Adrian’s heartbeat run on a cheap model and Adrian’s interactive sessions use a frontier one. It’s also a forking point that doctor’s migration logic doesn’t follow.

There’s a structural reason. Doctor walks agents.list[] and agents.defaults because those are the canonical sources of truth for “what model should this agent use.” The cron file is in a separate state store (~/.openclaw/cron/jobs.json), and conceptually it’s operational state — schedule, last run time, retry counts — not config. Migrating it means crossing a boundary that doctor’s logic respects today.

I get the design choice. It’s still wrong from a user perspective. If you rename a namespace, you need to migrate every place that namespace is referenced.

The fix

Three commands:

openclaw cron edit 555362cc-4074-4825-8ad4-ad207f5e5235 --model openai/gpt-5.5
openclaw cron edit 71687784-3193-4fe4-adce-092dff509991 --model openai/gpt-5.5
openclaw cron edit 871acd43-c5e7-44fc-b496-7151c1186f9f --model openai/gpt-5.5

Verified clean:

$ sudo cat /home/agentops/.openclaw/cron/jobs.json | jq '
    [.jobs[] | select(.payload.model // "" | startswith("openai-codex/"))]'
[]

How I’m catching this in the future

I’m adding three lines to my upgrade runbook:

# After every openclaw doctor --fix, audit cron payload models
sudo cat /home/agentops/.openclaw/cron/jobs.json | jq '
  [.jobs[] | .payload.model] | unique | .[]'

If anything in the list looks suspicious — old namespace, decommissioned model, weird typo — fix it before declaring the upgrade done.

This is the third time in two months that I’ve discovered something needed a separate migration after a doctor pass:

  • 2026-04-13: cron timeouts needed manual updates after the agent-runtime config schema changed.
  • 2026-04-16: heartbeat targets needed target: "telegram" instead of target: "none" after the heartbeat plugin tightened its validation.
  • 2026-05-06: cron payload models needed manual migration after the codex namespace rename.

Pattern: doctor fixes the schema, but operational state stored in adjacent files lags behind. The fix is either to expand doctor’s migration scope, or to ship a separate openclaw migrate operational-state command that walks every state store and reconciles it against the new schema.

Filing as a feature request. Until then, the runbook line stays.

What I’d actually like

A pre-flight check: openclaw upgrade --dry-run that shows you, before you run the actual upgrade, every config and operational-state location that will need to change. So you can stage the work, do the upgrade, and run a single --apply-staged-migrations flag to commit everything at once. Atomic.

That’s a lot of engineering for a small operator like me to ask for. But every time I do an upgrade, I lose 30 minutes to discovering one of these gaps. The total time spent across the OpenClaw user base on this kind of cleanup must be substantial.

For today: factory’s running. Adrian’s heartbeat at 16:02 fired clean on openai/gpt-5.5 with agentRuntime: pi. Builder’s next inbox check at 17:30 will be the first real proof that the cron-payload migration took. I’ll update this post if anything else falls over.


Back to index