Jack Anderson

Autoscaling a Kubernetes Deployment

2 minute read

I’ve been working on an autoscaling demo for a minute. To do this, I’ve put together a homelab cluster, built a service that will hog every ounce of CPU, and deployed it.

The baseline

BaaS (or Bogosort as a service) is designed to be incredibly inefficient. Even a relatively simple request should pin the CPU and hit any limits we’ve established. The definition of the deployment is as such:

apiVersion: apps/v1
kind: Deployment
  name: baas
  namespace: baas
  replicas: 1
      app: baas
        app: baas
      terminationGracePeriodSeconds: 30
        - name: app
          image: jelkand/baas:latest
            - containerPort: 3000
          imagePullPolicy: Always
              memory: '128Mi'
              cpu: '250m'
              memory: '256Mi'
              cpu: '500m'

Note that when we use about 50% of a single CPU, we should be hitting a limit. It’s a soft limit, so kubernetes will let us go above that for a while, but it’s a limit nonetheless.

Also note that we’ve asked for a single replica, and when this is deployed, that’s what we get. I’m excluding a couple load balancers from this list–but they’re there too.

Single pod on the dashboard

Here’s the request we’re sending over, via Insomnia. Note that sorting 13 integers takes a minute and a half, and nearly 67 million shuffles.

Request via Insomnia

As we might expect, sending a request will take our CPU usage to the limit, up to a full CPU of utilization (the limit for a single pod on node as it’s single threaded). We have not yet added in an autoscaler, so we still only see one pod.

Pinned CPU with a request


Let’s add a horizontal pod autoscaler to the definition.

We’ll scale up to a maximum of 5 replicas, and scale up when we hit 80 percent CPU utilization.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
  name: baas
  namespace: baas
    apiVersion: apps/v1
    kind: Deployment
    name: baas
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 80

Applying this shows us:

horizontalpodautoscaler.autoscaling/baas created

When we fire off another request we see four more pods spin up to handle the load–just what we’d expect. Also note that they’re distributed between my two agent nodes.

Five pods spinning up to handle the load

In our case, we could handle a new request for each pod. There’s only so much the autoscaler can do when the service is so bad.

Hopefully this has been an interesting exercise–if you’d like to chat about it, have queestions, or have feedback, please don’t hesitate to say hi at blog@jackanderson.dev!