The Problem with Business Automation at Scale
Most businesses start automating in the same way. Someone discovers Zapier or Make, wires up a few triggers and actions, and within an afternoon they have eliminated a tedious manual task. It feels like a superpower. It is a superpower — for simple, stable, linear workflows.
Then the business grows. The workflows get more complex. The edge cases multiply. The team changes. And slowly, quietly, the automation layer becomes one of the most brittle and least understood parts of the entire operation.
This is not a criticism of any particular tool. It is a description of a structural problem that shows up regardless of what tool you use — unless you design around it from the start.
The no-code ceiling
No-code automation tools are genuinely excellent for what they are designed to do: connect well-defined SaaS APIs with minimal engineering effort. The problem is not the tools. It is what happens as requirements evolve.
No version control. When a Zapier workflow breaks, there is no git log to tell you what changed, who changed it, or when. There is no way to diff the current state against last week's state. There is no rollback button that restores the exact configuration that was working three months ago. Debugging becomes archaeology.
Hidden state. Complex no-code workflows accumulate implicit state — values passed through steps, filters applied mid-chain, conditional branches that only fire in specific circumstances. This state is rarely documented and almost never tested. It lives in the tool's UI, visible only to whoever built it, understandable only to whoever remembers building it.
Vendor lock-in by logic. When your business logic lives inside a third-party workflow tool, migrating away means rebuilding every workflow from scratch. The logic is not portable. It is not inspectable by a developer. It cannot be reviewed, linted, or tested with standard engineering tools.
Cost unpredictability at volume. Per-task pricing models work fine at low volume. At the scale where automation is providing real business value — thousands or tens of thousands of runs per month — the pricing often becomes the argument for building something custom.
The monolith problem
The next phase is usually a custom script. A developer writes a Python or Node.js file that does everything: fetches the data, transforms it, calls the API, sends the notification. It runs on a cron job somewhere. It works.
Until it doesn't.
Monolithic automation scripts have a specific failure mode: they fail silently, partially, and at the worst possible time. The script runs, gets halfway through, hits an unexpected API response, throws an unhandled exception, and exits. Half the records were processed. Half were not. Nobody knows which half. The next run processes everything again and creates duplicates. Or skips them entirely. Or both, depending on the day of the week.
The deeper problem is that as the script grows — and it always grows — it becomes impossible to reason about. Adding a new data source means understanding every assumption embedded in the existing code. Changing the output format means finding everywhere the format is referenced. Testing any part of it means setting up the entire thing.
This is the monolith problem that microservice architecture was invented to solve. The same problem exists in automation, and the same solution applies.
The naive LLM integration
When developers first add LLM calls to automation workflows, the usual pattern is to drop an API call into the existing code. The script fetches some data, passes it to an LLM, gets a response back as a string, and proceeds.
This works in demos. In production, several things break.
Unvalidated output. LLM responses are free-form text. Code that expects a JSON object and receives a slightly different JSON object — or valid JSON with an unexpected key, or a markdown code block containing JSON — fails in ways that are hard to anticipate and harder to debug.
No retry logic for soft failures. LLMs occasionally return responses that are technically valid but logically wrong: a hallucinated value, a misunderstood instruction, a response that passes a length check but fails a semantic one. Without validation and retry logic, these failures propagate silently downstream.
Unbounded execution time. LLM calls can take seconds or minutes depending on the model and the payload size. A synchronous script that blocks on an LLM call inside a loop will tie up a server, exhaust timeouts, and create cascading failures under load.
No cost accountability. When every step of an automation workflow calls an LLM, and the workflow runs thousands of times, the token costs become significant. Without visibility into which steps are calling which models with what payloads, optimisation is guesswork.
No separation of concerns. When data fetching, LLM reasoning, and output delivery are all tangled together in one function, none of them can be tested, replaced, or scaled independently.
What the failure actually looks like
These are not theoretical problems. Here is what they look like in practice:
A weekly analytics report workflow stops sending emails. Nobody notices for three weeks because the failure is silent. Investigation reveals that an upstream API changed its response format six weeks ago, the script started failing on week two, and the three weeks of missing reports cannot be reconstructed because there was no state tracking.
A customer onboarding automation sends duplicate welcome emails to 200 customers because a retry loop did not check whether the first attempt had already succeeded. There is no idempotency key because nobody thought to add one.
An LLM-powered data enrichment job runs for four hours before timing out, having processed 60% of the records. Re-running it processes everything again. There is no checkpoint system because checkpointing was going to be added "later."
A junior developer updates the prompt in a data classification agent without realising it affects a downstream reporting workflow that had been calibrated around the old output format. The reports are wrong for a quarter before anyone notices.
These failures share a common thread: they are not failures of the individual tools or models. They are failures of architecture. The components work. The system does not, because there was no system — only components.
What a solution looks like
A robust agentic automation system needs to provide:
- Isolated, single-purpose agents so failures are contained and components are independently replaceable
- A typed contract between agents so output from one step is validated before it becomes input to the next
- A persistent orchestration layer that tracks workflow state and can resume, retry, or alert on failure
- Git-based deployment so every change to agent logic is reviewed, versioned, and reversible
- Cost-efficient infrastructure that does not charge for idle agents between runs
The rest of this guide describes exactly that system.