Day 9: The Great Audit — When Your Agents Look Busy But Are Actually Broken (Again)

Date: 2026-03-15 Author: Jeff (written with AI assistance) Phase: v3 Phase 1 — First Revenue

The 6-Day Gap

We left the system running on Day 3 and went to enjoy Europe for almost a week. The agents had clear instructions, the gateway was running, the cron jobs were firing. Surely they’d have something to show for six days of autonomous operation.

They did not.

Revenue: $0. Not “close to revenue” or “building toward revenue” — zero. The gateway had been running the entire time, heartbeats pulsing, agents responding to prompts, tokens being consumed. To anyone glancing at the dashboard, everything looked healthy. Busy agents, regular activity, no crashes.

But nothing was actually happening. Products were built but never deployed. Deliverables were completed but never reviewed. The entire pipeline was broken, and not a single alarm went off.

This is the second time we’ve had the “agents look busy but are actually broken” problem. Day 2 was about observability. Day 9 is about the failure mode that observability alone can’t catch: agents doing real work that goes absolutely nowhere because of a single wrong file path.

Adrian’s Path Bug: 60 Deliverables in a Black Hole

Here’s the root cause of the six-day stall: Adrian, our CEO agent, was looking for agent deliverables in workspace/agents/. The other agents were writing their deliverables to workspaces/*/ready-for-review/.

One path. One wrong directory. Sixty-plus deliverables sitting in a folder that nobody was reading.

Adrian would check his folder, find nothing, and conclude the team hadn’t produced anything. The worker agents would complete tasks, write deliverables to the correct output directory, and wait for Adrian’s review — which never came. Both sides thought the other was the bottleneck. Neither was wrong. The pipeline was just disconnected.

The fix had three parts:

Corrected Adrian’s config to look in workspaces/*/ready-for-review/ instead of workspace/agents/
Built a relay script (relay-deliverables.sh) that copies deliverables from worker output dirs to Adrian’s inbox as a safety net — if paths drift again, the relay catches it
5-minute cron job running the relay, so even if something breaks, deliverables flow within 5 minutes

Lesson: a multi-agent system with correct individual behavior can be completely broken by one path mismatch at the integration point. Each agent was doing its job perfectly. The system was producing nothing.

The Cost Wake-Up Call

While we were away, the system burned roughly EUR 30 in API costs. Not catastrophic, but notable when the revenue column reads $0.

Revenue-ops had actually flagged a cost anomaly in one of its reports. The report was sitting in ready-for-review/, unread by Adrian (see: path bug). So the agent designed to catch exactly this kind of problem did catch it — and then the report vanished into the same black hole as everything else.

Heartbeats were the main offender again. Even at reduced frequency, agents with nothing to do were still consuming tokens to say “nothing to report.” We’ve now tightened this further: worker agents get zero heartbeats. They activate only when Adrian delegates a task. No idle burn.

Product Quality Reality Check

Two products had been “built” by the builder agent. Sounds good on paper. In practice:

Product 1 (Scholarship Search Toolkit): The guide was 21 lines. Not 21 pages — 21 lines. The CSV data files were headers only. No data. The agent had created the file structure, written the headers, and marked the task as complete. Technically, the deliverable existed. Practically, it was empty.

Product 2 (marketing materials): The marketer agent had written landing page copy that included fabricated testimonials from fictional users and invented statistics. “Over 5,000 families have used our toolkit!” We’ve helped zero families. The number was a hallucination presented as fact.

This is the lesson that keeps coming back: AI agents will confidently produce garbage if nobody checks. Not sometimes — every time the quality bar isn’t enforced externally. The agent doesn’t know its CSV is empty. The marketer doesn’t know its testimonials are fake. They completed their tasks according to their prompts and reported success.

Rule going forward: Nothing ships without a human reviewing the actual output files. Not the completion report. The files.

The Payment Integration Mess

We’d been planning to use LemonSqueezy for info product sales. After approval came through, we discovered a fundamental architecture mismatch: LemonSqueezy requires separate storefront approval for each store. Our portfolio approach means dozens of products across different domains. Getting per-storefront approval for each one defeats the purpose of rapid launch-and-test.

Pivot: Stripe Payment Links. Zero approval process per product (we already had a Stripe account). Create a product, generate a link, embed it on the page. Done in minutes, not days.

We also set up Payhip as a backup storefront — it handles digital product delivery natively, which Stripe Payment Links doesn’t. Belt and suspenders.

Lesson learned the hard way: payment integration should be validated BEFORE building products, not after. We built two products before confirming we could actually accept money for them.

Infrastructure Upgrades

The audit triggered a wave of infrastructure hardening:

OpenClaw v2026.3.13 — Updated from 2026.3.2, re-applied all EISDIR patches (5 code paths, 8 files, same as before)
systemd service — Gateway now runs as a proper systemd unit with automatic restart on crash. No more manual nohup and praying
Watchdog script — Checks gateway health every 2 minutes, restarts if unresponsive, sends Telegram alert
EISDIR patches updated — New version had slightly different minified output; patches needed adjustment
fail2ban — SSH brute-force protection. Should have been day 1, honestly
Webshare residential proxy — Reddit blocks datacenter IPs for unauthenticated scraping. Residential proxy bypasses this. Scout can now scan Reddit without waiting for API approval

First Product Goes Live

After the audit and all the fixes, we actually shipped something.

The Scholarship Search Concierge Toolkit — properly rebuilt with real content this time — is deployed to Cloudflare Pages at toolkit.buildwithjz.com. It includes:

A functional scholarship search guide (not 21 lines)
Working CSV datasets with actual data
Stripe checkout at $29
Legal pages (terms, privacy, refund policy) hosted at legal.buildwithjz.com

Is this going to make $10k/month? No. But it’s the first product in the pipeline with a real purchase flow, real hosting, and real payment processing. The infrastructure now exists to go from “agent builds product” to “customer can buy product” without a week-long integration detour.

Day 9 by the Numbers

Metric	Value
Revenue	$0
Days agents ran unattended	6
Useful output from those 6 days	Near zero
Deliverables stuck in wrong path	60+
API cost during absence	~EUR 30
Products “built” by agents	2
Products actually shippable	0 (before audit)
Fabricated testimonials caught	Multiple
Products now live	1 (Scholarship Toolkit, $29)
Payment provider pivot	LemonSqueezy -> Stripe Payment Links
OpenClaw version	2026.3.13 (from 2026.3.2)

Key Lessons

Autonomous agents need supervision infrastructure. A watchdog, relay scripts, health checks, alerting. “Set it and forget it” is a fantasy. The agents will keep running and keep consuming resources whether or not they’re doing anything useful.
“Built” doesn’t mean “ready to sell.” Always QA agent output by opening the actual files. A completion report that says “task done” is worthless if the deliverable is 21 lines of nothing.
Payment integration should be set up BEFORE building products. We built two products before confirming we could accept money. Validate the entire pipeline end-to-end before investing build time.
A wrong file path can silently break an entire multi-agent pipeline. Each agent was working correctly in isolation. The system failed at the integration point. This is the hardest class of bug in multi-agent systems — everything looks right until you trace the data flow.
AI agents will fabricate social proof. Testimonials, statistics, user counts — if the prompt doesn’t explicitly prohibit invention, the model will invent. Review everything before it faces a customer.

What’s Next: Day 10+

Start the gateway with fixed agent paths and the relay cron active
Market the scholarship toolkit — Reddit threads, HN, targeted outreach
Rebuild the expat relocation kit with real content (not empty CSVs)
Expand Scout’s signal sources to Reddit via the residential proxy
First real revenue? We’ll see. The purchase flow works. Now we need a buyer.

Nine days in. Six of them wasted. EUR 30 burned on agents talking to themselves. Two products built that couldn’t be sold. But we learned more from the failure than from the building — and the first product is now actually live with a working checkout. The gap between “agents are running” and “the business is running” is larger than any architecture diagram suggests. We’re closing it one broken path at a time.