Skip to main content
The Upgrade page allows you to upgrade your Ceph cluster to newer versions. Upgrades are performed in a controlled manner, updating one daemon at a time to minimize disruption. You can monitor progress, pause, resume, or stop upgrades as needed.

Key Concepts

Rolling Upgrade

Ceph upgrades one daemon at a time, maintaining cluster availability throughout the process.

Target Version

The Ceph version or container image you are upgrading to.

Daemon

A Ceph service process (mon, mgr, osd, mds, rgw) that gets upgraded individually.

Upgrade Blockers

Conditions that prevent an upgrade from starting, such as cluster health errors.

Required Permissions

ActionPermission
View Upgrade Statusiam:project:infrastructure:ceph:read
Start Upgradeiam:project:infrastructure:ceph:write
Pause Upgradeiam:project:infrastructure:ceph:write
Resume Upgradeiam:project:infrastructure:ceph:write
Stop Upgradeiam:project:infrastructure:ceph:write
View Historyiam:project:infrastructure:ceph:read

Upgrade Status

StatusDescription
Up to DateCluster is running the latest available version
AvailableNewer version is available for upgrade
In ProgressUpgrade is currently running
PausedUpgrade is paused and can be resumed

How to Check Available Upgrades

1

Select Cluster

Choose a Ceph cluster from the cluster dropdown.
2

View Current Status

Review the statistics cards showing:
  • Current Version: The version currently running
  • Target: Available upgrade target or “Up to Date”
  • Progress: Percentage complete (if upgrading)
  • Daemons: Number of daemons in the cluster
  • Health: Current cluster health status
3

Check for Blockers

If blockers exist, they will be displayed with warnings. Common blockers include:
  • HEALTH_ERR: Cluster has critical health errors
  • Active maintenance operations
  • Insufficient redundancy

How to Start an Upgrade

1

Click Start Upgrade

Click the Start Upgrade button to open the upgrade wizard.
2

Pre-flight Check

The wizard checks cluster health and readiness:
  • Cluster health status (HEALTH_OK, HEALTH_WARN, HEALTH_ERR)
  • Critical blockers are highlighted
Critical blockers (like HEALTH_ERR) must be resolved before upgrading. Non-critical warnings allow proceeding with caution.
3

Select Version

Choose your upgrade target:
  • Available Versions: Pre-defined versions from the registry
  • Custom Image: Specify a container image URL directly
The recommended version is highlighted if available.
4

Confirm Upgrade

Type UPGRADE to confirm you want to proceed. This prevents accidental upgrades.
5

Monitor Progress

The wizard shows real-time progress:
  • Overall progress percentage
  • Current daemon being upgraded
  • Daemon upgrade status (pending, upgrading, upgraded)
IMPORTANT: Upgrades can take significant time depending on cluster size. Do not interrupt power or network during the upgrade process.

How to Use Custom Image Upgrades

Custom image upgrades allow you to specify an exact container image URL, useful for:
  • Testing pre-release versions
  • Using custom-built images
  • Air-gapped environments with private registries
1

Start Upgrade Wizard

Click Start Upgrade and complete the pre-flight check.
2

Select Custom Image

Toggle the Use Custom Image option in the version selection step.
3

Enter Image URL

Enter the full container image URL, for example:
  • quay.io/ceph/ceph:v18.2.4
  • registry.example.com/ceph/ceph:v18.2.4
4

Confirm and Start

Complete the confirmation step to begin the upgrade.

How to Pause an Upgrade

Pausing stops the upgrade at the current daemon while maintaining cluster stability.
1

Locate Pause Button

During an active upgrade, the Pause button appears in the upgrade controls.
2

Click Pause

Click Pause to stop the upgrade process. The current daemon will complete before pausing.
3

Verify Paused State

The status changes to “Paused” and the cluster remains in a mixed-version state.
Pausing is useful when you need to investigate issues or perform other maintenance. The cluster remains operational during the pause.

How to Resume an Upgrade

1

Verify Paused State

Ensure the upgrade shows as “Paused” in the status.
2

Click Resume

Click the Resume button to continue the upgrade from where it stopped.
3

Monitor Progress

The upgrade continues with the next daemon in sequence.

How to Stop an Upgrade

Stopping an upgrade cancels the process entirely.
1

Click Stop

Click the Stop button during an active or paused upgrade.
2

Confirm Stop

Confirm that you want to stop the upgrade.
Stopping an upgrade leaves the cluster in a mixed-version state. While Ceph supports mixed versions, it’s recommended to either complete the upgrade or roll back manually.

How to View Upgrade History

1

Navigate to History Tab

Click the History tab on the Upgrade page.
2

Review Past Upgrades

The history table shows:
  • From/To versions
  • Status (success, failed, cancelled)
  • Start and completion times
  • Duration
  • Number of daemons upgraded
3

Filter Results

Use search and pagination to find specific upgrade records.

Statistics Cards

Current Version

The Ceph version currently running on the cluster (e.g., 18.2.2 reef).

Target/Available

Shows either:
  • Up to Date: No newer version available
  • Available: A newer version is available to upgrade to
  • Target: The version being upgraded to (during upgrade)

Progress

Percentage of daemons that have been upgraded. Only shown during active upgrades.

Daemons

Total number of Ceph daemons that will be upgraded.

Health

Current cluster health status:
  • HEALTH_OK: All healthy
  • HEALTH_WARN: Warnings present (upgrade can proceed with caution)
  • HEALTH_ERR: Critical issues (upgrade blocked)

Upgrade Order

Ceph upgrades daemons in a specific order to maintain cluster stability:
  1. Managers (mgr) - Updated first as they coordinate the upgrade
  2. Monitors (mon) - Quorum is maintained throughout
  3. OSDs - Updated one at a time to preserve data availability
  4. Metadata Servers (mds) - For CephFS clusters
  5. RADOS Gateways (rgw) - For object storage clusters

Troubleshooting

  • Check cluster health with ceph health detail
  • Resolve any HEALTH_ERR conditions
  • Ensure all OSDs are up and in
  • Verify no other operations are in progress
  • Each daemon restart takes time
  • Large OSDs may take longer to restart
  • Check for recovery/backfill operations
  • Network bandwidth affects image download speed
  • Check the daemon’s host is accessible
  • Verify the container image is available
  • Review daemon logs on the host
  • Consider pausing and investigating
  • Mixed versions are supported but not ideal
  • Resume the upgrade to complete it
  • If issues persist, check version compatibility
  • Contact support for rollback procedures
  • Verify the image URL is correct
  • Check registry authentication
  • Ensure nodes can reach the registry
  • Test with podman pull <image> on a node
  • Versions are managed by the system administrator
  • Check if versions are marked as active in settings
  • Verify your cluster’s current version name matches version groups

FAQ

Upgrade duration depends on:
  • Number of daemons in the cluster
  • OSD sizes (larger OSDs take longer to restart)
  • Network speed for image downloads
  • Any recovery operations that occur
Small clusters may complete in under an hour; large clusters can take several hours.
Yes. Ceph’s rolling upgrade process maintains cluster availability:
  • Only one daemon is upgraded at a time
  • Monitor quorum is preserved
  • Data redundancy protects against OSD restarts
Clients may experience brief latency spikes during daemon restarts.
Ceph doesn’t support automatic rollback. Options include:
  • Restore from backup (if available)
  • Manually reinstall previous version (complex)
  • Complete the upgrade and address issues
Prevention is best: test upgrades in a staging environment first.
Generally, you should upgrade to the next major version only. For example:
  • Quincy → Reef (supported)
  • Pacific → Reef (not recommended, upgrade to Quincy first)
Check Ceph’s official upgrade documentation for supported paths.
It depends on the warning:
  • Minor warnings (clock skew, nearfull): Usually safe to proceed
  • Data-related warnings (degraded PGs): Resolve first
  • OSD warnings (down OSDs): Fix before upgrading
Review each warning to understand its impact.
Ceph is resilient to interruptions:
  • Completed daemons retain their new version
  • In-progress daemon may need manual recovery
  • The upgrade can be resumed once power is restored
  • Data is protected by replication
Signs of successful upgrade:
  • Progress shows 100%
  • Current Version matches Target Version
  • Cluster health returns to HEALTH_OK
  • All daemons show the new version
Check the History tab for the official completion record.