Key Concepts
Rolling Upgrade
Ceph upgrades one daemon at a time, maintaining cluster availability throughout the process.
Target Version
The Ceph version or container image you are upgrading to.
Daemon
A Ceph service process (mon, mgr, osd, mds, rgw) that gets upgraded individually.
Upgrade Blockers
Conditions that prevent an upgrade from starting, such as cluster health errors.
Required Permissions
| Action | Permission |
|---|---|
| View Upgrade Status | iam:project:infrastructure:ceph:read |
| Start Upgrade | iam:project:infrastructure:ceph:write |
| Pause Upgrade | iam:project:infrastructure:ceph:write |
| Resume Upgrade | iam:project:infrastructure:ceph:write |
| Stop Upgrade | iam:project:infrastructure:ceph:write |
| View History | iam:project:infrastructure:ceph:read |
Upgrade Status
| Status | Description |
|---|---|
| Up to Date | Cluster is running the latest available version |
| Available | Newer version is available for upgrade |
| In Progress | Upgrade is currently running |
| Paused | Upgrade is paused and can be resumed |
How to Check Available Upgrades
View Current Status
Review the statistics cards showing:
- Current Version: The version currently running
- Target: Available upgrade target or “Up to Date”
- Progress: Percentage complete (if upgrading)
- Daemons: Number of daemons in the cluster
- Health: Current cluster health status
How to Start an Upgrade
Pre-flight Check
The wizard checks cluster health and readiness:
- Cluster health status (HEALTH_OK, HEALTH_WARN, HEALTH_ERR)
- Critical blockers are highlighted
Select Version
Choose your upgrade target:
- Available Versions: Pre-defined versions from the registry
- Custom Image: Specify a container image URL directly
How to Use Custom Image Upgrades
Custom image upgrades allow you to specify an exact container image URL, useful for:- Testing pre-release versions
- Using custom-built images
- Air-gapped environments with private registries
Enter Image URL
Enter the full container image URL, for example:
quay.io/ceph/ceph:v18.2.4registry.example.com/ceph/ceph:v18.2.4
How to Pause an Upgrade
Pausing stops the upgrade at the current daemon while maintaining cluster stability.Click Pause
Click Pause to stop the upgrade process. The current daemon will complete before pausing.
Pausing is useful when you need to investigate issues or perform other maintenance. The cluster remains operational during the pause.
How to Resume an Upgrade
How to Stop an Upgrade
Stopping an upgrade cancels the process entirely.How to View Upgrade History
Review Past Upgrades
The history table shows:
- From/To versions
- Status (success, failed, cancelled)
- Start and completion times
- Duration
- Number of daemons upgraded
Statistics Cards
Current Version
The Ceph version currently running on the cluster (e.g., 18.2.2 reef).Target/Available
Shows either:- Up to Date: No newer version available
- Available: A newer version is available to upgrade to
- Target: The version being upgraded to (during upgrade)
Progress
Percentage of daemons that have been upgraded. Only shown during active upgrades.Daemons
Total number of Ceph daemons that will be upgraded.Health
Current cluster health status:- HEALTH_OK: All healthy
- HEALTH_WARN: Warnings present (upgrade can proceed with caution)
- HEALTH_ERR: Critical issues (upgrade blocked)
Upgrade Order
Ceph upgrades daemons in a specific order to maintain cluster stability:- Managers (mgr) - Updated first as they coordinate the upgrade
- Monitors (mon) - Quorum is maintained throughout
- OSDs - Updated one at a time to preserve data availability
- Metadata Servers (mds) - For CephFS clusters
- RADOS Gateways (rgw) - For object storage clusters
Troubleshooting
Upgrade shows 'Upgrade Blocked'
Upgrade shows 'Upgrade Blocked'
- Check cluster health with
ceph health detail - Resolve any HEALTH_ERR conditions
- Ensure all OSDs are up and in
- Verify no other operations are in progress
Upgrade is very slow
Upgrade is very slow
- Each daemon restart takes time
- Large OSDs may take longer to restart
- Check for recovery/backfill operations
- Network bandwidth affects image download speed
Upgrade stuck on a daemon
Upgrade stuck on a daemon
- Check the daemon’s host is accessible
- Verify the container image is available
- Review daemon logs on the host
- Consider pausing and investigating
Cluster unhealthy after partial upgrade
Cluster unhealthy after partial upgrade
- Mixed versions are supported but not ideal
- Resume the upgrade to complete it
- If issues persist, check version compatibility
- Contact support for rollback procedures
Custom image not pulling
Custom image not pulling
- Verify the image URL is correct
- Check registry authentication
- Ensure nodes can reach the registry
- Test with
podman pull <image>on a node
Cannot find available versions
Cannot find available versions
- Versions are managed by the system administrator
- Check if versions are marked as active in settings
- Verify your cluster’s current version name matches version groups
FAQ
How long does an upgrade take?
How long does an upgrade take?
Upgrade duration depends on:
- Number of daemons in the cluster
- OSD sizes (larger OSDs take longer to restart)
- Network speed for image downloads
- Any recovery operations that occur
Is the cluster available during upgrade?
Is the cluster available during upgrade?
Yes. Ceph’s rolling upgrade process maintains cluster availability:
- Only one daemon is upgraded at a time
- Monitor quorum is preserved
- Data redundancy protects against OSD restarts
What if I need to roll back?
What if I need to roll back?
Ceph doesn’t support automatic rollback. Options include:
- Restore from backup (if available)
- Manually reinstall previous version (complex)
- Complete the upgrade and address issues
Can I skip versions?
Can I skip versions?
Generally, you should upgrade to the next major version only. For example:
- Quincy → Reef (supported)
- Pacific → Reef (not recommended, upgrade to Quincy first)
What is the recommended version?
What is the recommended version?
The recommended version is determined by:
- Latest stable release in your version series
- System administrator configuration
- Known compatibility with your environment
Should I upgrade if cluster shows HEALTH_WARN?
Should I upgrade if cluster shows HEALTH_WARN?
It depends on the warning:
- Minor warnings (clock skew, nearfull): Usually safe to proceed
- Data-related warnings (degraded PGs): Resolve first
- OSD warnings (down OSDs): Fix before upgrading
What happens if power fails during upgrade?
What happens if power fails during upgrade?
Ceph is resilient to interruptions:
- Completed daemons retain their new version
- In-progress daemon may need manual recovery
- The upgrade can be resumed once power is restored
- Data is protected by replication
How do I know the upgrade completed successfully?
How do I know the upgrade completed successfully?
Signs of successful upgrade:
- Progress shows 100%
- Current Version matches Target Version
- Cluster health returns to HEALTH_OK
- All daemons show the new version