Object Storage Daemons (OSDs) are the core storage components of a Ceph cluster. Each OSD manages a physical disk and handles data replication, recovery, and rebalancing. This page allows you to view OSD status, add new OSDs, and safely remove existing ones.Documentation Index
Fetch the complete documentation index at: https://docs.shiftlabs.dev/llms.txt
Use this file to discover all available pages before exploring further.
Key Concepts
OSD
Device Class
Up/Down
In/Out
Required Permissions
| Action | Permission |
|---|---|
| View OSDs | iam:project:infrastructure:ceph:read |
| Add OSD | iam:project:infrastructure:ceph:write |
| Mark In/Out | iam:project:infrastructure:ceph:write |
| Reweight OSD | iam:project:infrastructure:ceph:write |
| Scrub OSD | iam:project:infrastructure:ceph:execute |
| Remove OSD | iam:project:infrastructure:ceph:delete |
OSD Status
Up/Down Status
| Status | Description |
|---|---|
| Up | OSD daemon is running and responsive |
| Down | OSD daemon is stopped or not responding |
In/Out Status
| Status | Description |
|---|---|
| In | OSD participates in data placement and receives data |
| Out | OSD is excluded from data placement; data migrates away |
Device Classes
| Class | Description |
|---|---|
| HDD | Traditional rotational hard disk drive |
| SSD | Solid-state drive with faster random I/O |
| NVMe | High-performance NVMe solid-state drive |
How to View OSDs
Select Cluster
View OSD List
Filter and Search
How to Add an OSD
Adding an OSD creates a new storage daemon on an available disk.Select Disks
- Device path (e.g.,
/dev/sdb) - Host where the disk is located
- Disk size
- Device type (HDD, SSD, NVMe)
How to Mark an OSD Out
Marking an OSD “out” removes it from data placement, causing data to migrate to other OSDs.How to Mark an OSD In
Marking an OSD “in” adds it back to data placement.How to Scrub an OSD
Scrubbing verifies data integrity by comparing object replicas across OSDs.Select Scrub Type
- Scrub: Light verification of object metadata
- Deep Scrub: Full verification including data checksums
How to Remove an OSD
Removing an OSD is a multi-step process to ensure data safety. A guided wizard walks you through the process.Safety Check
- Safe to Destroy: All data has sufficient replicas elsewhere
- Not Safe: Removal would cause data loss (requires force removal)
Data Migration
OSD Table Fields
| Field | Description |
|---|---|
| OSD | OSD identifier (e.g., osd.0, osd.1) |
| Host | Node where the OSD is running |
| Class | Device class (HDD, SSD, NVMe) |
| Status | Up/Down and In/Out status badges |
| PGs | Number of placement groups on this OSD |
| Size | Total capacity of the OSD |
| Usage | Current utilization percentage with visual bar |
Removal Wizard Steps
Step 1: Pre-flight Check
Shows selected OSDs with:- OSD ID and hostname
- Device class
- Data to migrate
- Placement group count
Step 2: Safety Check
Verifies if OSDs can be safely destroyed:- Checks replica counts for all affected placement groups
- Shows “Safe to Destroy” or “Not Safe to Destroy”
- Option to force removal if unsafe (not recommended)
Step 3: Data Migration
- Marks OSDs as “out”
- Monitors data migration progress
- Shows when all placement groups are active+clean
- Option to skip waiting (may cause data loss)
Step 4: Confirm Removal
- Requires typing REMOVE to confirm
- Option to force removal (skips safety checks)
- Executes OSD deletion
Step 5: Cleanup
- Shows completion status
- Provides zap commands for disk cleanup
- Commands can be copied to clipboard
Troubleshooting
OSD shows Down status
OSD shows Down status
- Check if the OSD host is accessible
- Verify the OSD daemon is running:
systemctl status ceph-osd@<id> - Check for disk failures or hardware issues
- Review OSD logs for errors
No available disks for new OSD
No available disks for new OSD
- All disks may already have OSDs
- Ensure the node has the OSD role assigned
- Check if disks are properly detected by the system
- Verify disks aren’t mounted or in use by other services
OSD utilization is very high
OSD utilization is very high
- Consider adding more OSDs to the cluster
- Check if other OSDs are down or out
- Verify CRUSH rules are distributing data evenly
- Consider reweighting OSDs to balance load
Data migration is slow
Data migration is slow
- This is normal for large amounts of data
- Check network bandwidth between nodes
- Verify no backfill/recovery throttling is set too low
- Monitor
ceph statusfor recovery progress
Cannot remove OSD - not safe to destroy
Cannot remove OSD - not safe to destroy
- Some placement groups don’t have enough replicas
- Wait for recovery to complete
- Check if other OSDs are down
- Use force removal only if you accept potential data loss
OSD removal stuck
OSD removal stuck
- Check cluster health for blocking issues
- Verify network connectivity
- Check if the OSD daemon is still running
- Review operation logs for specific errors
Scrub taking too long
Scrub taking too long
- Deep scrub is I/O intensive on large OSDs
- Check OSD performance and disk health
- Consider adjusting scrub scheduling options
- Large OSDs with many objects take longer
FAQ
What is the difference between Up/Down and In/Out?
What is the difference between Up/Down and In/Out?
- Up + In: Normal operation, storing and serving data
- Up + Out: Running but not receiving data (draining)
- Down + In: Not running but expected to return (temporary failure)
- Down + Out: Not running and excluded from placement
When should I mark an OSD out vs remove it?
When should I mark an OSD out vs remove it?
- Performing temporary maintenance
- The OSD will return to service
- You want to drain data without removing the OSD
- Decommissioning a disk permanently
- Replacing failed hardware
- The OSD will not return to service
What happens when I add a new OSD?
What happens when I add a new OSD?
- The Ceph orchestrator deploys an OSD daemon on the disk
- The OSD is added to the CRUSH map
- The cluster begins rebalancing data to include the new OSD
- Data gradually distributes across all OSDs
How long does OSD removal take?
How long does OSD removal take?
- Amount of data on the OSD
- Network speed between nodes
- Number of remaining OSDs
- Current cluster load
What is the 'safe to destroy' check?
What is the 'safe to destroy' check?
- Checks all placement groups on the OSD
- Ensures each PG has sufficient replicas on other OSDs
- If any PG would lose its last copy, removal is blocked
What are the zap commands for?
What are the zap commands for?
ceph orch device zap command:- Removes all Ceph data from the disk
- Clears partition tables and LVM data
- Makes the disk available for reuse
Should I use force removal?
Should I use force removal?
- Understand the risk of data loss
- Have backups of critical data
- Are removing already-failed OSDs
- Accept that some data may be lost
What is OSD reweight?
What is OSD reweight?
- 1.0: Full weight, normal data distribution
- 0.5: Half weight, receives half the normal data
- 0.0: No data (equivalent to marking out)
When should I run a deep scrub?
When should I run a deep scrub?
- Run periodically for data verification
- After suspected disk issues
- When data corruption is suspected