· Charlie Holland · Architecture · 6 min read
Serverless at Scale: An Honest Take
A thousand tiny Lambda functions scattered everywhere with no ownership, no contracts, and no way to test them locally. Sound familiar? Here's when serverless works, when it doesn't, and the discipline most teams skip.
I’ll be honest: my default reaction to serverless at scale is scepticism.
A whole load of 10-line functions scattered all over the place, wired together with a tangle of SNS topics, SQS queues, and EventBridge rules. No one owns any of it. No one can draw the system on a whiteboard. Testing means deploying to AWS and praying. It’s the kind of architecture that works brilliantly in a conference demo and falls apart the moment a second team touches it.
And yet — some organisations do run serverless at scale, successfully, without hating themselves. So either I’m wrong, or there’s something they’re doing that most teams aren’t.
Turns out, it’s the latter.
When serverless becomes a mess
The failure mode is predictable and I’ve seen it enough times to describe it from memory:
Function-per-IF-statement “micro-functions.” Someone decides that each Lambda should do exactly one thing, and interprets that as “each Lambda should be eight lines of code.” You end up with hundreds of functions that are individually trivial and collectively incomprehensible.
Ad-hoc wiring with no model of the system. SNS feeds SQS feeds Lambda feeds EventBridge feeds another Lambda. Nobody has a diagram. Nobody knows what happens when you change an event schema. You find out when something breaks on a Friday afternoon.
No contracts, no domains, no ownership. Functions are created by whoever needs them, named inconsistently, and deployed through whatever mechanism was convenient that day. There’s no concept of bounded contexts or service ownership.
Testing is an afterthought. The functions are so tightly coupled to AWS services that the only way to test them is to deploy them. Local development is painful. The test pyramid has been replaced by “deploy and see what happens.”
If this sounds like your organisation, serverless isn’t the problem. The problem is a lack of engineering discipline — and that would sink any architecture, containers included.
What disciplined serverless actually looks like
The teams that run serverless well treat it like any other distributed system. They apply the same architectural principles you’d use for microservices — bounded contexts, explicit contracts, separation of concerns — and then deploy the result as functions instead of containers.
Keep logic out of handlers
This is the single most important pattern. Your Lambda handler should be a thin adapter — it parses the event, calls your domain logic, and returns a response. The domain logic lives in its own module with no AWS dependencies. Pure functions that take inputs and return outputs.
# adapters/handlers/create_invoice.py
@with_correlation
def handler(event, _ctx):
cmd = parse_api_event(event) # parse the AWS event
repo = DdbInvoices() # adapter for DynamoDB
result = create_invoice(cmd, repo) # pure domain call
return http_ok(result)The domain function is completely testable without AWS:
def test_create_invoice_taxes_rounding():
repo = InMemoryInvoices()
out = create_invoice(Cmd(total=100, country="GB"), repo)
assert out.vat == 20 and repo.saved_count == 1This is hexagonal architecture, originally described by Alistair Cockburn as “Ports and Adapters.” It’s not new. But in the serverless world, people skip it because the functions are “so simple they don’t need structure.” That’s how you end up with untestable spaghetti.
Deploy packages, not random functions
Each bounded context gets its own infrastructure module — Terraform, CDK, or SAM. Versioned. Reviewed. Deployed as a unit. Not hand-wired through the console.
Shared concerns — logging, tracing, metrics, error handling — live in a shared layer or internal package. Powertools for AWS Lambda is excellent for this. Every function gets structured logs, correlation IDs, and metrics for free.
Orchestrate explicitly
Use Step Functions for business workflows instead of daisy-chaining events. Step Functions give you a visual model of the flow, built-in retries and error handling, and execution history you can actually debug.
Use EventBridge for genuine pub/sub — with a schema registry and versioned event contracts. Not as a way to avoid thinking about how services communicate.
The rule of thumb: if A needs B to happen next, use orchestration. If A doesn’t care who’s listening, use events. Most teams use events for everything because it “feels more decoupled.” It’s not decoupled — it’s just harder to debug.
The testing strategy that actually works
Here’s the uncomfortable truth about serverless testing: you do need to deploy to AWS to test properly. The cloud is your runtime. Trying to fully simulate API Gateway, EventBridge, Step Functions, and IAM locally is a fool’s errand.
But you don’t deploy to production. You deploy to a throwaway environment:
80-90% runs locally, fast, no cloud needed. Domain logic unit tests. Contract tests for APIs (OpenAPI) and events (JSON Schema). Handler tests with mocked adapters. This is where your test pyramid lives.
10-20% runs against real AWS. Ephemeral environments per pull request — IaC spins up a temporary stack in a sandbox account, runs end-to-end tests, and tears it down on merge. You’re testing real integrations, but in isolation.
Production releases use canary deployments. CodeDeploy aliases shift traffic gradually — 1%, 10%, 100% — with automatic rollback on error rate or latency alarms. If bad code gets through, it hits 1% of traffic before anyone notices.
Platform guardrails
The best serverless organisations have a small platform team that publishes golden templates: base function scaffolds, CI/CD pipelines, standard alarms and dashboards, IAM policies, tagging rules, and cost controls.
Product teams fill in the domain code and configuration. They don’t make infrastructure decisions — those are made once, well, and enforced through templates.
When to skip serverless entirely
None of the above helps if serverless is the wrong tool for the job:
- Long-lived, stateful workflows with lots of shared state — use containers or a workflow engine like Temporal.
- Heavy compute — ML inference, large batch processing, sustained throughput — containers or Fargate.
- Strict latency requirements where cold starts are unacceptable and provisioned concurrency is too expensive.
- Teams that can’t or won’t adopt the discipline above. This is the most common reason. If your organisation doesn’t have the engineering maturity to maintain contracts, hexagonal architecture, and automated testing, serverless will make things worse, not better. Stick to services.
The pragmatic middle path
What I usually recommend — and what I’ve done in practice across multiple engagements — is a split:
Core APIs and shared services → containers on Kubernetes. You get consistent local development, straightforward testing, and an operational model the team already understands.
Edge and async glue → serverless with the discipline above. Scheduled jobs, ETL triggers, event processors, webhook handlers. The stuff that’s genuinely event-driven and benefits from scale-to-zero.
A platform team provides the guardrails for both. Golden templates, CI/CD, observability, security controls. Product teams own domains end-to-end.
Serverless isn’t inherently unmaintainable. But it requires more upfront discipline than most teams expect — and significantly more than the “just write a Lambda” marketing suggests. If you’re going to do it, do it properly. If you’re not willing to invest in the engineering practices that make it work, containers are the safer bet.
And if someone tells you they run 500 Lambdas in production with no issues, ask them how they test. The answer will tell you everything you need to know.
