The first overnight

I went to bed at 1 AM CEST with four tasks queued:

A) Hire Week 2 agents (Designer, Copywriter, CMO, Legal, SEO)
B) Build the Factory Flow dashboard view
C) Build the Distribution Posts dashboard view
D) Monitor the first real product deploy chain and write a morning recap

No babysitting. No mid-session check-ins. Closed laptop, went to sleep.

This post is what I saw when I opened the laptop in the morning.

Three products live on the internet

Between 1:32 and 1:47 CEST, while I was asleep, the v6 factory chain did this end-to-end, for the first time, without human intervention:

Adrian (CEO agent, gpt-5.3-codex-spark) read my P0 brief from his inbox, picked three products from the 14-item backlog, and wrote three deploy specs to Release Engineer’s inbox — one batched, one urgent.
Release Engineer (newly hired in Week 1, same model) read the specs, fixed a template regression in one of the builder outputs, deployed all three products to Cloudflare Pages, created DNS CNAMEs, and handed off to QA Engineer.
QA Engineer (newly hired in Week 1) ran a Playwright battery on all three products and wrote verdict files.

Live URLs as I write this:

ai-visibility-audit-playbook.buildwithjz.com → 200
aeo-starter-kit.buildwithjz.com → 200
mcp-security-hardening-guide.buildwithjz.com → 200

All three with real Stripe Payment Links for purchase. No placeholders, no {{CHECKOUT_LINK}} leaks. This was v5’s recurring failure mode and it didn’t happen.

The two bugs it found and what I had to do

Bug 1: Release Engineer forgot to attach the custom domain to the Pages project.

Cloudflare Pages requires three steps for a subdomain to return 200:

Deploy the code (wrangler pages deploy)
Create a DNS CNAME (<sub>.buildwithjz.com → <proj>.pages.dev, proxied)
Attach the custom domain to the Pages project (a separate API call)

RE did steps 1 and 2 for all three products. It missed step 3. Result: DNS routed requests to Pages, Pages said “I don’t serve this hostname,” Cloudflare returned 522 errors. I fixed it by calling the Pages API for each product and patched Release Engineer’s DIRECTIVES.md with a prominent “EASY TO MISS” section so this doesn’t happen on the next deploy. Thirty seconds to fix, ten minutes to write a good enough DIRECTIVES patch that future RE runs won’t repeat it.

Bug 2: URL mismatch between Adrian’s QA brief and Release Engineer’s QA handoff.

Adrian told QA to test https://cloudaiops.com/<slug> paths. Release Engineer told QA to test https://<slug>.buildwithjz.com. QA picked Adrian’s URLs and reported FAIL because cloudaiops.com doesn’t have routing for those paths.

This is a factory coordination bug. Adrian thought the products would land on cloudaiops.com — that’s our content/affiliate site, not our product hosting. Release Engineer correctly chose buildwithjz.com subdomains (Jeff’s W2 constraint — cloudaiops is content-only, not commerce). When two sources disagreed about the URL, QA trusted the wrong one.

I wrote a correction spec to QA’s inbox at 1:50 CEST telling it to re-run on the correct URLs, which the next cron picked up. For the morning: Adrian’s initial QA brief generator needs to pull the deploy URL from RE’s handoff, not from the original spec — that’s a DIRECTIVES fix.

What Week 2 looks like, concretely

Five new agents joined the gateway:

Agent	Model	Role
Designer	Claude Sonnet 4.6	Visual quality review — hierarchy, spacing, typography, CTAs, trust, mobile, OG images
Copywriter	Claude Sonnet 4.6	Voice review — catches banned phrases, weak CTAs, fake claims, refund language
CMO	Claude Sonnet 4.6	Strategic GTM — positioning, channel mix, break-even math, launch window, kill thresholds
Legal	Claude Haiku 4.5	Compliance — claim substantiation, disclosures, privacy, refund, trademark collision, W2-conflict (the one that matters most to me)
SEO	GLM-5.1 (Ollama Cloud)	On-page SEO, keyword strategy, AEO readiness, content briefs for Blog Writer

All five read an inbox every 10 minutes, in parallel. When Adrian ships a product, he fans out to all five simultaneously — not sequentially — so the 5-way review completes in the time of the slowest reviewer, not the sum of them all.

Gateway registers 14 agents (was 7 at the start of v6 Week 1).

Adrian didn’t actually use the parallel fan-out tonight — he went straight from Builder to Release Engineer, bypassing the five reviewers. That’s fine for this first run, but also a behavior I need to reinforce in his DIRECTIVES. The parallel reviewers exist so the first deploy isn’t the first quality check; tonight they got skipped. Tomorrow’s job: make sure Adrian uses them.

Dashboard Phase 4

Two new views went live:

#/factory — a 10-stage state-machine visualization. Signals → Scored → Build Queue → Building → Builder Review → Parallel QA Gates → Deploying → QA → Live → Distributing. Each stage shows count, breakdown, last activity, and a red outline when items are past SLA. Auto-refresh every 15 seconds. Playwright-verified; zero console errors.

#/distribution — the source→revenue attribution surface. Reads the distribution_posts Postgres table shipped in Week 1. Empty right now (Distribution Drafter comes online Week 3) but the plumbing works. 8 totals-tiles, platform + product breakdowns, 200-row posts table with UTM tags.

I could watch the factory move in real time on the dashboard while the chain was running. Didn’t need to narrate it to myself or ssh into a dozen files.

The time scoreboard

Week 1 (agents + schema + blog site) took 3h 20m of active time.

Week 2 hiring took 15 minutes of active time. Five new agents, each with a SOUL.md + DIRECTIVES.md + IDENTITY.md + workspace scaffolding, five models from three providers, five new crons wired. The pattern from Week 1 compiled once and amortized.

Dashboard Phase 4 took ~45 minutes of active time. Both views, both API endpoints, both Playwright-verified, both pushed to GitHub. The factory-flow endpoint alone is a single Promise.all over 18 Postgres queries + 7 filesystem counts — that’s the kind of thing that would have been a 6-hour solo day a year ago.

Overnight monitoring: zero active time. The factory ran itself. The two bugs it hit got caught by the system + fixed by me in five minutes each (with DIRECTIVES patches so they don’t recur).

Running scoreboard:

Phase	Scope	Active time	Solo estimate	Multiplier
v6 design + Week 1	4 tasks (agents + schema + blog)	3h 20m	40h	12×
v6 Week 2 hiring	5 specialist agents + cron wiring	15m	40h	160×
Dashboard Phase 4	Factory Flow + Distribution views	~45m	20h	27×
Total v6 (W1+W2+dash P4)	9 agents + schema + 2 dashboard views + 2 bugfixes + live deploys	~4h 20m	100h	23×

The Week 2 number is ridiculous in a “compounding returns” way. Week 1 built the pattern — agent workspace scaffolding, SOUL.md template, DIRECTIVES.md template, openclaw.json patch script, inbox-check cron template. Week 2 was five applications of that pattern in a row. Same scope, different models, same shape. 160× isn’t a sustained reality — it’s what happens when the second week writes itself once the first week exists. By Week 3 the number comes back down as I hit genuinely new engineering (browser automation for Distribution Drafter, auth for platform posting, the autopost trust ledger).

What this means for the factory

Two claims I can now make with evidence:

1. The deploy bottleneck is solved. Three products went from Builder-approved-but-stranded to live-on-public-URL-with-working-Stripe in under 20 minutes, zero human fingerprints on the process. That was the whole point of Release Engineer, and it worked.

2. The QA gate is real but not infallible. QA Engineer ran. It wrote verdict files. It had wrong URLs. It reported FAIL. That’s not a bug in the agent — that’s exactly what a QA engineer does when the test plan is wrong. The fix is a DIRECTIVES clarification upstream, not a model swap. This is the difference between “AI agents sort-of work” and “AI agents are a supervised team.” Tonight the team had a coordination bug and the system surfaced it. Fine.

What’s next today

Re-dispatch the three products through the parallel review gates (Designer, Copywriter, CMO, Legal, SEO). First real test of the full 7-way quality fan-out.
Fix Adrian’s DIRECTIVES to always use RE’s handoff URL, not the spec URL, when writing QA briefs.
The MCP Security Hardening Guide is the most time-sensitive — there’s an active CVE conversation on HN right now. If Legal clears it and SEO has a content brief ready, Distribution Drafter (Week 3) can draft a Show HN post.
Adrian’s cron error “Unsupported channel: heartbeat” is cosmetic — work completes — but 38 consecutive errors clutter the log. Root-cause that.

I think the thing I didn’t expect was the acceleration of Week 2. I had budgeted an hour. It took fifteen minutes. Not because I’m faster today than yesterday, but because the shape of “hire a specialist agent” is fully written down and executable. The factory is starting to feel less like a collection of scripts and more like an organization with a personnel file. Every future hire gets cheaper.

Three products live on the internet as I sleep. The distance between “idea” and “for sale” was four hours of human time over two days. The distance between “for sale” and “first dollar” is the harder problem — but it’s the only problem that’s left.

— Jeff (withJZ)