The Upgrade page allows you to upgrade your Ceph cluster to newer versions. Upgrades are performed in a controlled manner, updating one daemon at a time to minimize disruption. You can monitor progress, pause, resume, or stop upgrades as needed.Documentation Index
Fetch the complete documentation index at: https://docs.shiftlabs.dev/llms.txt
Use this file to discover all available pages before exploring further.
Key Concepts
Rolling Upgrade
Target Version
Daemon
Upgrade Blockers
Required Permissions
| Action | Permission |
|---|---|
| View Upgrade Status | iam:project:infrastructure:ceph:read |
| Start Upgrade | iam:project:infrastructure:ceph:write |
| Pause Upgrade | iam:project:infrastructure:ceph:write |
| Resume Upgrade | iam:project:infrastructure:ceph:write |
| Stop Upgrade | iam:project:infrastructure:ceph:write |
| View History | iam:project:infrastructure:ceph:read |
Upgrade Status
| Status | Description |
|---|---|
| Up to Date | Cluster is running the latest available version |
| Available | Newer version is available for upgrade |
| In Progress | Upgrade is currently running |
| Paused | Upgrade is paused and can be resumed |
How to Check Available Upgrades
View Current Status
- Current Version: The version currently running
- Target: Available upgrade target or “Up to Date”
- Progress: Percentage complete (if upgrading)
- Daemons: Number of daemons in the cluster
- Health: Current cluster health status
How to Start an Upgrade
Pre-flight Check
- Cluster health status (HEALTH_OK, HEALTH_WARN, HEALTH_ERR)
- Critical blockers are highlighted
Select Version
- Available Versions: Pre-defined versions from the registry
- Custom Image: Specify a container image URL directly
How to Use Custom Image Upgrades
Custom image upgrades allow you to specify an exact container image URL, useful for:- Testing pre-release versions
- Using custom-built images
- Air-gapped environments with private registries
Enter Image URL
quay.io/ceph/ceph:v18.2.4registry.example.com/ceph/ceph:v18.2.4
How to Pause an Upgrade
Pausing stops the upgrade at the current daemon while maintaining cluster stability.Click Pause
How to Resume an Upgrade
How to Stop an Upgrade
Stopping an upgrade cancels the process entirely.How to View Upgrade History
Review Past Upgrades
- From/To versions
- Status (success, failed, cancelled)
- Start and completion times
- Duration
- Number of daemons upgraded
Statistics Cards
Current Version
The Ceph version currently running on the cluster (e.g., 18.2.2 reef).Target/Available
Shows either:- Up to Date: No newer version available
- Available: A newer version is available to upgrade to
- Target: The version being upgraded to (during upgrade)
Progress
Percentage of daemons that have been upgraded. Only shown during active upgrades.Daemons
Total number of Ceph daemons that will be upgraded.Health
Current cluster health status:- HEALTH_OK: All healthy
- HEALTH_WARN: Warnings present (upgrade can proceed with caution)
- HEALTH_ERR: Critical issues (upgrade blocked)
Upgrade Order
Ceph upgrades daemons in a specific order to maintain cluster stability:- Managers (mgr) - Updated first as they coordinate the upgrade
- Monitors (mon) - Quorum is maintained throughout
- OSDs - Updated one at a time to preserve data availability
- Metadata Servers (mds) - For CephFS clusters
- RADOS Gateways (rgw) - For object storage clusters
Troubleshooting
Upgrade shows 'Upgrade Blocked'
Upgrade shows 'Upgrade Blocked'
- Check cluster health with
ceph health detail - Resolve any HEALTH_ERR conditions
- Ensure all OSDs are up and in
- Verify no other operations are in progress
Upgrade is very slow
Upgrade is very slow
- Each daemon restart takes time
- Large OSDs may take longer to restart
- Check for recovery/backfill operations
- Network bandwidth affects image download speed
Upgrade stuck on a daemon
Upgrade stuck on a daemon
- Check the daemon’s host is accessible
- Verify the container image is available
- Review daemon logs on the host
- Consider pausing and investigating
Cluster unhealthy after partial upgrade
Cluster unhealthy after partial upgrade
- Mixed versions are supported but not ideal
- Resume the upgrade to complete it
- If issues persist, check version compatibility
- Contact support for rollback procedures
Custom image not pulling
Custom image not pulling
- Verify the image URL is correct
- Check registry authentication
- Ensure nodes can reach the registry
- Test with
podman pull <image>on a node
Cannot find available versions
Cannot find available versions
- Versions are managed by the system administrator
- Check if versions are marked as active in settings
- Verify your cluster’s current version name matches version groups
FAQ
How long does an upgrade take?
How long does an upgrade take?
- Number of daemons in the cluster
- OSD sizes (larger OSDs take longer to restart)
- Network speed for image downloads
- Any recovery operations that occur
Is the cluster available during upgrade?
Is the cluster available during upgrade?
- Only one daemon is upgraded at a time
- Monitor quorum is preserved
- Data redundancy protects against OSD restarts
What if I need to roll back?
What if I need to roll back?
- Restore from backup (if available)
- Manually reinstall previous version (complex)
- Complete the upgrade and address issues
Can I skip versions?
Can I skip versions?
- Quincy → Reef (supported)
- Pacific → Reef (not recommended, upgrade to Quincy first)
What is the recommended version?
What is the recommended version?
- Latest stable release in your version series
- System administrator configuration
- Known compatibility with your environment
Should I upgrade if cluster shows HEALTH_WARN?
Should I upgrade if cluster shows HEALTH_WARN?
- Minor warnings (clock skew, nearfull): Usually safe to proceed
- Data-related warnings (degraded PGs): Resolve first
- OSD warnings (down OSDs): Fix before upgrading
What happens if power fails during upgrade?
What happens if power fails during upgrade?
- Completed daemons retain their new version
- In-progress daemon may need manual recovery
- The upgrade can be resumed once power is restored
- Data is protected by replication
How do I know the upgrade completed successfully?
How do I know the upgrade completed successfully?
- Progress shows 100%
- Current Version matches Target Version
- Cluster health returns to HEALTH_OK
- All daemons show the new version