Key Concepts
HPA
A controller that automatically adjusts the number of pod replicas based on metrics.
Target
The workload resource (Deployment, ReplicaSet, StatefulSet) that the HPA scales.
Metrics
The measurements (CPU, memory, custom) used to determine scaling decisions.
Replicas
The number of pod instances, bounded by minReplicas and maxReplicas.
Required Permissions
| Action | Permission |
|---|---|
| View HPAs | iam:project:infrastructure:kubernetes:read |
| Create HPA | iam:project:infrastructure:kubernetes:write |
| Edit HPA | iam:project:infrastructure:kubernetes:write |
| Delete HPA | iam:project:infrastructure:kubernetes:delete |
HPA Status Values
| Status | Description |
|---|---|
| Active | HPA is active and current replicas match desired replicas |
| ScalingUp | HPA is scaling up (current < desired replicas) |
| ScalingDown | HPA is scaling down (current > desired replicas) |
| Inactive | HPA is inactive (desired replicas is 0) |
| ScalingLimited | Scaling is limited by min/max replica bounds |
| Unknown | Status cannot be determined |
How to View HPAs
How to View HPA Details
How to Create an HPA
Write YAML
Enter the HPA manifest in YAML format. Key fields:
spec.scaleTargetRef- Target workload to scalespec.minReplicas- Minimum replica countspec.maxReplicas- Maximum replica countspec.metrics- Metrics to trigger scaling
How to Edit an HPA
Modify Spec
Edit the HPA specification. Common changes:
- Adjust min/max replicas
- Change metric thresholds
- Add or remove metrics
How to Delete an HPA
Metric Types
HPAs support several metric types:| Type | Description | Example |
|---|---|---|
| Resource | CPU or memory utilization | CPU at 80% |
| Pods | Custom metrics from pods | Requests per second |
| Object | Metrics from other Kubernetes objects | Queue length |
| External | Metrics from external systems | Cloud queue depth |
Resource Metrics
Custom Metrics
Scaling Behavior
HPA v2 supports configuring scaling behavior:| Setting | Description |
|---|---|
| stabilizationWindowSeconds | Time to wait before scaling (prevents flapping) |
| policies | Rules for how quickly to scale |
Troubleshooting
HPA shows 'unknown' for current metrics
HPA shows 'unknown' for current metrics
- Verify metrics-server is installed and running
- Check target pods have resource requests defined
- Wait for metrics collection (can take a few minutes)
- Verify metrics API is accessible:
kubectl top pods
HPA not scaling up
HPA not scaling up
- Check current replicas equals maxReplicas (at limit)
- Verify metric thresholds are being exceeded
- Check HPA conditions for errors
- Ensure target workload exists and is not paused
HPA not scaling down
HPA not scaling down
- Check current replicas equals minReplicas (at minimum)
- Verify stabilization window has passed
- Check scale-down policies if configured
- Review HPA events for scaling decisions
HPA scaling too aggressively
HPA scaling too aggressively
- Increase stabilizationWindowSeconds
- Adjust scale-down policies to be more gradual
- Consider using multiple metrics for better decision making
- Review and tune metric thresholds
Target workload not found
Target workload not found
- Verify the target exists in the same namespace
- Check scaleTargetRef name and kind are correct
- Ensure apiVersion matches the target resource
Custom metrics not working
Custom metrics not working
- Verify Prometheus Adapter or custom metrics API is configured
- Check metric name matches exactly
- Ensure metrics are being exported by pods
- Test with
kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1
FAQ
What resources can HPAs scale?
What resources can HPAs scale?
HPAs can scale Deployments, ReplicaSets, and StatefulSets. The target must support the
/scale subresource. DaemonSets cannot be scaled by HPAs.How quickly does HPA respond to load changes?
How quickly does HPA respond to load changes?
HPA checks metrics every 15 seconds by default (configurable via
--horizontal-pod-autoscaler-sync-period). Actual scaling depends on stabilization windows and policies.What happens if I delete an HPA?
What happens if I delete an HPA?
The target workload stays at its current replica count. Automatic scaling stops until a new HPA is created or you manually scale the workload.
Can I have multiple HPAs for one Deployment?
Can I have multiple HPAs for one Deployment?
No. Only one HPA should target each workload. Multiple HPAs would conflict with each other’s scaling decisions.
What's the difference between HPA v1 and v2?
What's the difference between HPA v1 and v2?
HPA v2 supports multiple metrics, custom metrics, external metrics, and configurable scaling behavior. v1 only supports CPU and basic scaling. Always use v2 (autoscaling/v2).
Do I need metrics-server for HPA?
Do I need metrics-server for HPA?
Yes, for resource metrics (CPU/memory). Custom metrics require additional components like Prometheus Adapter. External metrics require an external metrics provider.
How do I prevent scaling during deployments?
How do I prevent scaling during deployments?
Use the
--horizontal-pod-autoscaler-downscale-stabilization flag or configure behavior.scaleDown.stabilizationWindowSeconds to delay scale-down decisions.What if my pods don't have resource requests?
What if my pods don't have resource requests?
HPA cannot calculate utilization percentages without resource requests. Define CPU/memory requests on your containers for HPA to work correctly.