Use Spot Instance Fleet in the Right Way
- chrisnie4
- Apr 23
- 1 min read
You've set up a Spot Fleet with a few instance types. You've enabled Capacity Rebalancing. Your workload is still getting interrupted constantly. Here's what's actually happening.
Spot interruptions don't happen one node at a time.
When a capacity pool dries up — m5.large in us-east-1a, for example — every instance in that pool goes down together. You lose nothing for weeks, then 20 nodes disappear in the same minute.
A fleet of 10x the same instance type in one AZ just means your single point of failure got bigger.
The fix is pool-level diversification:
- 10+ different instance types (mix generations: m5, m5a, m6i, m6a — similar specs, different pools)
- Spread across all available AZs
- Use price-capacity-optimized — AWS draws from the deepest pools, not just the cheapest
When one pool dries up, the others absorb it. Most interruptions become invisible.
On Kubernetes: Karpenter handles this natively, but needs 15+ instance types to do spot-to-spot consolidation properly. The default configs I see in the wild use 3–4. That's why you're still getting hit.
Spot only works reliably when you treat it as a pool of capacity, not a list of machines.




Comments