top of page

Use Spot Instance Fleet in the Right Way

  • chrisnie4
  • Apr 23
  • 1 min read

You've set up a Spot Fleet with a few instance types. You've enabled Capacity Rebalancing. Your workload is still getting interrupted constantly. Here's what's actually happening.


Spot interruptions don't happen one node at a time.


When a capacity pool dries up — m5.large in us-east-1a, for example — every instance in that pool goes down together. You lose nothing for weeks, then 20 nodes disappear in the same minute.


A fleet of 10x the same instance type in one AZ just means your single point of failure got bigger.


The fix is pool-level diversification:


- 10+ different instance types (mix generations: m5, m5a, m6i, m6a — similar specs, different pools)

- Spread across all available AZs

- Use price-capacity-optimized — AWS draws from the deepest pools, not just the cheapest

When one pool dries up, the others absorb it. Most interruptions become invisible.

On Kubernetes: Karpenter handles this natively, but needs 15+ instance types to do spot-to-spot consolidation properly. The default configs I see in the wild use 3–4. That's why you're still getting hit.

Spot only works reliably when you treat it as a pool of capacity, not a list of machines.



 
 
 

Comments


bottom of page