Key Concepts
Job
A controller that creates pods and ensures a specified number complete successfully.
Completions
The number of pods that must complete successfully before the Job is done.
Parallelism
The maximum number of pods that can run simultaneously.
Backoff Limit
The number of retries before marking the Job as failed.
Required Permissions
| Action | Permission |
|---|---|
| View jobs | iam:project:infrastructure:kubernetes:read |
| Create job | iam:project:infrastructure:kubernetes:write |
| Edit job | iam:project:infrastructure:kubernetes:write |
| Delete job | iam:project:infrastructure:kubernetes:delete |
Job Status Values
| Status | Description |
|---|---|
| Complete | All required completions succeeded |
| Failed | Job exceeded backoff limit or deadline |
| Running | Pods are actively running |
| Pending | Job created but no pods started yet |
Job Metrics
| Metric | Description |
|---|---|
| Completions | Target number of successful pods (e.g., 3/5) |
| Succeeded | Number of pods that completed successfully |
| Failed | Number of pods that failed |
| Active | Number of pods currently running |
| Duration | Time from Job start to completion |
How to View Jobs
How to View Job Details
How to Create a Job
Write YAML
Enter the Job manifest in YAML format. Key fields:
spec.completions- Number of successful completions requiredspec.parallelism- Max concurrent podsspec.backoffLimit- Max retries before failurespec.template- Pod template specification
How to Edit a Job
How to Delete a Job
Deleting a Job uses the Background propagation policy by default. Pods created by the Job are deleted asynchronously.
Job Completion Modes
| Mode | Description |
|---|---|
| Non-indexed | Job completes when completions pods succeed (default) |
| Indexed | Each pod gets an index (0 to completions-1), useful for parallel processing with distinct work items |
Parallelism and Completions
| Setting | Behavior |
|---|---|
completions: 1, parallelism: 1 | Single pod, runs once (default) |
completions: N, parallelism: 1 | Sequential execution of N pods |
completions: N, parallelism: M | Up to M pods run in parallel until N complete |
completions: unset, parallelism: N | Work queue pattern - pods run until one succeeds |
Troubleshooting
Job stuck in Pending
Job stuck in Pending
- Check if namespace has resource quotas blocking pod creation
- Verify the container image exists and is accessible
- Check for missing ConfigMaps, Secrets, or PVCs
- Review Job events for scheduling errors
Job keeps failing (backoff limit reached)
Job keeps failing (backoff limit reached)
- Check pod logs for application errors
- Verify command and arguments are correct
- Check if required environment variables are set
- Review resource limits - pods may be OOMKilled
- Increase
backoffLimitif retries are expected
Job running longer than expected
Job running longer than expected
- Set
spec.activeDeadlineSecondsto limit total runtime - Check if pods are stuck waiting for resources
- Review pod logs for slow operations
- Consider increasing parallelism
Completed Jobs accumulating
Completed Jobs accumulating
- Set
ttlSecondsAfterFinishedto auto-delete completed Jobs - Jobs created by CronJobs are cleaned up by history limits
- Manually delete old Jobs if needed
Pods not being created
Pods not being created
- Verify Job status and events
- Check for selector mismatch between Job and pod template
- Review namespace resource quotas
- Check ServiceAccount permissions if using custom accounts
Job completed but pods still running
Job completed but pods still running
- This shouldn’t happen normally
- Check if
ttlSecondsAfterFinishedis set - Manually delete orphaned pods if needed
FAQ
What's the difference between Jobs and Deployments?
What's the difference between Jobs and Deployments?
Deployments keep pods running indefinitely and replace them if they fail. Jobs run pods to completion and consider success when the task finishes. Use Jobs for batch tasks, Deployments for services.
How do I run a Job on a schedule?
How do I run a Job on a schedule?
Use a CronJob, which creates Jobs on a time-based schedule. The CronJob controller automatically creates Jobs at specified intervals.
What happens if a Job pod fails?
What happens if a Job pod fails?
The Job controller creates a new pod up to
backoffLimit times. Each retry uses exponential backoff (10s, 20s, 40s…, capped at 6 minutes). After exceeding the limit, the Job is marked Failed.Can I restart a failed Job?
Can I restart a failed Job?
No. Jobs cannot be restarted. Delete the failed Job and create a new one with the same specification.
How do I limit how long a Job can run?
How do I limit how long a Job can run?
Set
spec.activeDeadlineSeconds to specify the maximum runtime in seconds. The Job will be terminated if it exceeds this duration, regardless of completion status.What is ttlSecondsAfterFinished?
What is ttlSecondsAfterFinished?
This field automatically deletes completed Jobs after the specified number of seconds. For example,
ttlSecondsAfterFinished: 3600 deletes the Job one hour after completion.How do I process a work queue with Jobs?
How do I process a work queue with Jobs?
Use a Job with
parallelism set but completions unset. Each pod processes items from a shared queue until the queue is empty, then exits successfully.What's the default backoff limit?
What's the default backoff limit?
The default
backoffLimit is 6, meaning Kubernetes will retry failed pods up to 6 times before marking the Job as failed.Can I update a running Job?
Can I update a running Job?
Most fields are immutable. You can update some metadata (labels, annotations) but not the pod template, completions, or parallelism. For changes, delete and recreate the Job.