Key Concepts
OSD
Object Storage Daemon - a service that stores data on a physical disk and handles replication.
Device Class
The type of storage device: HDD (rotational), SSD (solid-state), or NVMe (high-speed SSD).
Up/Down
Whether the OSD process is running (Up) or stopped (Down).
In/Out
Whether the OSD is participating in data placement (In) or excluded from it (Out).
Required Permissions
| Action | Permission |
|---|---|
| View OSDs | iam:project:infrastructure:ceph:read |
| Add OSD | iam:project:infrastructure:ceph:write |
| Mark In/Out | iam:project:infrastructure:ceph:write |
| Reweight OSD | iam:project:infrastructure:ceph:write |
| Scrub OSD | iam:project:infrastructure:ceph:execute |
| Remove OSD | iam:project:infrastructure:ceph:delete |
OSD Status
Up/Down Status
| Status | Description |
|---|---|
| Up | OSD daemon is running and responsive |
| Down | OSD daemon is stopped or not responding |
In/Out Status
| Status | Description |
|---|---|
| In | OSD participates in data placement and receives data |
| Out | OSD is excluded from data placement; data migrates away |
An OSD can be Up but Out - this means it’s running but not receiving new data. This is commonly used during maintenance or before removal.
Device Classes
| Class | Description |
|---|---|
| HDD | Traditional rotational hard disk drive |
| SSD | Solid-state drive with faster random I/O |
| NVMe | High-performance NVMe solid-state drive |
How to View OSDs
Select Cluster
Choose a Ceph cluster from the cluster dropdown. The first ready cluster is selected by default.
View OSD List
The table shows all OSDs with their status, host, device class, placement groups, and utilization.
Filter and Search
Use the search box to find OSDs by ID, hostname, or device class. Filter by status (Up, Down, In, Out).
How to Add an OSD
Adding an OSD creates a new storage daemon on an available disk.Select Disks
Check the disks you want to use as OSDs. Each disk shows:
- Device path (e.g.,
/dev/sdb) - Host where the disk is located
- Disk size
- Device type (HDD, SSD, NVMe)
Only available disks that don’t already have OSDs are shown. If no disks appear, all disks may already be in use or there may be no OSD-role nodes in the cluster.
How to Mark an OSD Out
Marking an OSD “out” removes it from data placement, causing data to migrate to other OSDs.Mark Out is the first step in safely removing an OSD. It triggers data migration without stopping the OSD daemon.
How to Mark an OSD In
Marking an OSD “in” adds it back to data placement.How to Scrub an OSD
Scrubbing verifies data integrity by comparing object replicas across OSDs.Select Scrub Type
Choose either:
- Scrub: Light verification of object metadata
- Deep Scrub: Full verification including data checksums
How to Remove an OSD
Removing an OSD is a multi-step process to ensure data safety. A guided wizard walks you through the process.Safety Check
The system checks if removal is safe:
- Safe to Destroy: All data has sufficient replicas elsewhere
- Not Safe: Removal would cause data loss (requires force removal)
Data Migration
Click Start Migration to mark OSDs out and begin data migration. Wait for all placement groups to reach active+clean state.
OSD Table Fields
| Field | Description |
|---|---|
| OSD | OSD identifier (e.g., osd.0, osd.1) |
| Host | Node where the OSD is running |
| Class | Device class (HDD, SSD, NVMe) |
| Status | Up/Down and In/Out status badges |
| PGs | Number of placement groups on this OSD |
| Size | Total capacity of the OSD |
| Usage | Current utilization percentage with visual bar |
Removal Wizard Steps
Step 1: Pre-flight Check
Shows selected OSDs with:- OSD ID and hostname
- Device class
- Data to migrate
- Placement group count
Step 2: Safety Check
Verifies if OSDs can be safely destroyed:- Checks replica counts for all affected placement groups
- Shows “Safe to Destroy” or “Not Safe to Destroy”
- Option to force removal if unsafe (not recommended)
Step 3: Data Migration
- Marks OSDs as “out”
- Monitors data migration progress
- Shows when all placement groups are active+clean
- Option to skip waiting (may cause data loss)
Step 4: Confirm Removal
- Requires typing REMOVE to confirm
- Option to force removal (skips safety checks)
- Executes OSD deletion
Step 5: Cleanup
- Shows completion status
- Provides zap commands for disk cleanup
- Commands can be copied to clipboard
Troubleshooting
OSD shows Down status
OSD shows Down status
- Check if the OSD host is accessible
- Verify the OSD daemon is running:
systemctl status ceph-osd@<id> - Check for disk failures or hardware issues
- Review OSD logs for errors
No available disks for new OSD
No available disks for new OSD
- All disks may already have OSDs
- Ensure the node has the OSD role assigned
- Check if disks are properly detected by the system
- Verify disks aren’t mounted or in use by other services
OSD utilization is very high
OSD utilization is very high
- Consider adding more OSDs to the cluster
- Check if other OSDs are down or out
- Verify CRUSH rules are distributing data evenly
- Consider reweighting OSDs to balance load
Data migration is slow
Data migration is slow
- This is normal for large amounts of data
- Check network bandwidth between nodes
- Verify no backfill/recovery throttling is set too low
- Monitor
ceph statusfor recovery progress
Cannot remove OSD - not safe to destroy
Cannot remove OSD - not safe to destroy
- Some placement groups don’t have enough replicas
- Wait for recovery to complete
- Check if other OSDs are down
- Use force removal only if you accept potential data loss
OSD removal stuck
OSD removal stuck
- Check cluster health for blocking issues
- Verify network connectivity
- Check if the OSD daemon is still running
- Review operation logs for specific errors
Scrub taking too long
Scrub taking too long
- Deep scrub is I/O intensive on large OSDs
- Check OSD performance and disk health
- Consider adjusting scrub scheduling options
- Large OSDs with many objects take longer
FAQ
What is the difference between Up/Down and In/Out?
What is the difference between Up/Down and In/Out?
Up/Down indicates whether the OSD daemon process is running.In/Out indicates whether the OSD participates in data placement.An OSD can be:
- Up + In: Normal operation, storing and serving data
- Up + Out: Running but not receiving data (draining)
- Down + In: Not running but expected to return (temporary failure)
- Down + Out: Not running and excluded from placement
When should I mark an OSD out vs remove it?
When should I mark an OSD out vs remove it?
Mark Out when:
- Performing temporary maintenance
- The OSD will return to service
- You want to drain data without removing the OSD
- Decommissioning a disk permanently
- Replacing failed hardware
- The OSD will not return to service
What happens when I add a new OSD?
What happens when I add a new OSD?
When you add an OSD:
- The Ceph orchestrator deploys an OSD daemon on the disk
- The OSD is added to the CRUSH map
- The cluster begins rebalancing data to include the new OSD
- Data gradually distributes across all OSDs
How long does OSD removal take?
How long does OSD removal take?
Removal time depends on:
- Amount of data on the OSD
- Network speed between nodes
- Number of remaining OSDs
- Current cluster load
What is the 'safe to destroy' check?
What is the 'safe to destroy' check?
This check verifies that removing the OSD won’t cause data loss:
- Checks all placement groups on the OSD
- Ensures each PG has sufficient replicas on other OSDs
- If any PG would lose its last copy, removal is blocked
What are the zap commands for?
What are the zap commands for?
After removing an OSD, the disk may still have Ceph metadata. The
ceph orch device zap command:- Removes all Ceph data from the disk
- Clears partition tables and LVM data
- Makes the disk available for reuse
Should I use force removal?
Should I use force removal?
Avoid force removal unless you:
- Understand the risk of data loss
- Have backups of critical data
- Are removing already-failed OSDs
- Accept that some data may be lost
What is OSD reweight?
What is OSD reweight?
Reweight adjusts how much data an OSD receives (0.0 to 1.0):
- 1.0: Full weight, normal data distribution
- 0.5: Half weight, receives half the normal data
- 0.0: No data (equivalent to marking out)
When should I run a deep scrub?
When should I run a deep scrub?
Deep scrub verifies data integrity at the bit level:
- Run periodically for data verification
- After suspected disk issues
- When data corruption is suspected