Key Concepts
Service
A logical grouping of daemons of the same type (e.g., all monitors form the mon service).
Daemon
An individual Ceph process running on a specific host (e.g., mon.node1, osd.5).
Service Health
Determined by how many daemons are running: Healthy (all), Degraded (some), Down (none).
Redeploy
Recreates a daemon container with a fresh configuration, useful for troubleshooting.
Required Permissions
| Action | Permission |
|---|---|
| View Services | iam:project:infrastructure:ceph:read |
| View Daemons | iam:project:infrastructure:ceph:read |
| Start Daemon | iam:project:infrastructure:ceph:execute |
| Stop Daemon | iam:project:infrastructure:ceph:execute |
| Restart Daemon | iam:project:infrastructure:ceph:execute |
| Redeploy Daemon | iam:project:infrastructure:ceph:execute |
Service Types
| Type | Icon | Description |
|---|---|---|
| mon | Database | Monitor daemons - maintain cluster maps and quorum |
| mgr | Activity | Manager daemons - provide monitoring and orchestration |
| osd | Hard Drive | Object Storage Daemons - handle data storage on disks |
| mds | Layers | Metadata Server - manages CephFS metadata |
| rgw | Box | RADOS Gateway - provides S3/Swift object storage API |
| crash | Circle | Crash collector - gathers daemon crash reports |
| node-exporter | Activity | Prometheus node metrics exporter |
| prometheus | Activity | Metrics collection and alerting |
| grafana | Activity | Metrics visualization dashboards |
| alertmanager | Activity | Alert routing and notification |
Service Status
| Status | Indicator | Description |
|---|---|---|
| Healthy | Green | All daemons are running |
| Degraded | Amber | Some daemons are running, some are stopped |
| Down | Red | No daemons are running |
How to View Services
Select Cluster
Choose a Ceph cluster from the cluster dropdown. Only ready (bootstrapped) clusters are available.
View Statistics
Review the summary cards showing:
- Services: Total number of service types
- Daemons: Total daemon processes
- Running: Number of healthy daemons
- Stopped: Number of stopped daemons
- Types: Distinct service types deployed
- Hosts: Number of hosts running daemons
Filter Services
Use filters to narrow down the view:
- Type Filter: Show specific service types (mon, mgr, osd, etc.)
- Status Filter: Show healthy, degraded, or stopped services
How to View Daemon Details
How to Start a Daemon
Starting a daemon may take a few moments as the container needs to initialize and connect to the cluster.
How to Stop a Daemon
How to Restart a Daemon
Restarting is useful for applying configuration changes or recovering from transient issues.How to Redeploy a Daemon
Redeploying recreates the daemon container entirely, which can resolve issues with corrupted containers or apply new container images.Statistics Cards
Services
The total number of distinct service types in the cluster (mon, mgr, osd, etc.).Daemons
The total count of all daemon processes across all services and hosts.Running
Number of daemons currently in running state and serving the cluster.Stopped
Number of daemons that are not running. Non-zero values may indicate issues.Types
Count of unique service types deployed (same as Services count).Hosts
Number of distinct hosts running at least one daemon.Troubleshooting
Daemon won't start
Daemon won't start
- Check the host is online and accessible
- Verify there’s sufficient disk space for logs
- Check for port conflicts on the host
- Review daemon logs on the host:
journalctl -u ceph-<type>@<id> - Ensure the container image is available
Service shows Degraded status
Service shows Degraded status
- One or more daemons are stopped
- Expand the service to identify which daemons are down
- Try starting the stopped daemons
- Check host availability if start fails
All daemons show Stopped
All daemons show Stopped
- The cluster may not be properly bootstrapped
- Check network connectivity to cluster nodes
- Verify the orchestrator is running
- Check cluster health on the Clusters page
Daemon restart fails
Daemon restart fails
- The underlying host may be having issues
- Check disk space and memory on the host
- Try a redeploy instead of restart
- Check systemd service status on the host
Redeploy takes too long
Redeploy takes too long
- Container image may be downloading
- Network issues between nodes
- Large OSD daemons take longer to initialize
- Check orchestrator logs for progress
Version mismatch between daemons
Version mismatch between daemons
- Normal during a cluster upgrade
- Use the Upgrade page to complete version alignment
- Consider pausing workloads during upgrade
FAQ
What's the difference between restart and redeploy?
What's the difference between restart and redeploy?
Restart: Stops and starts the daemon process. Quick and preserves the container.Redeploy: Removes the container entirely and creates a new one from scratch. Slower but resolves container-level issues.Use restart for routine maintenance; use redeploy for troubleshooting.
Can I stop all daemons of a service at once?
Can I stop all daemons of a service at once?
Currently, daemon actions are performed individually. This is intentional to prevent accidental service outages. For maintenance, consider using node maintenance mode instead.
Why are some services not showing any daemons?
Why are some services not showing any daemons?
Some services may be defined but not yet deployed:
- osd: No disks have been added as OSDs
- mds: CephFS is not configured
- rgw: Object gateway is not deployed
How do I know if it's safe to stop a daemon?
How do I know if it's safe to stop a daemon?
Consider these factors:
- mon: Need majority for quorum (e.g., 2 of 3)
- mgr: Need at least 1 active manager
- osd: Data should be replicated to other OSDs
- mds: Standby MDS can take over
Why does the daemon version differ from cluster version?
Why does the daemon version differ from cluster version?
This typically occurs during or after an upgrade:
- Daemons upgrade one at a time
- Some daemons may not have restarted yet
- Manual intervention may be needed for stuck daemons
What are crash, node-exporter, and other monitoring daemons?
What are crash, node-exporter, and other monitoring daemons?
These are auxiliary services that support cluster monitoring:
- crash: Collects crash dumps from failed daemons
- node-exporter: Exports host metrics to Prometheus
- prometheus: Stores and queries metrics
- grafana: Visualizes metrics dashboards
- alertmanager: Handles alert routing
How often is the daemon status updated?
How often is the daemon status updated?
The page shows a “Last Refresh” timestamp for each daemon. Status is updated:
- When you load or refresh the page
- After performing daemon actions
- The orchestrator periodically checks daemon health
Can I see daemon logs from this page?
Can I see daemon logs from this page?
Currently, daemon logs are not viewable directly from this page. To view logs:
- SSH to the host running the daemon
- Use
journalctl -u ceph-<type>@<id>for systemd logs - Check
/var/log/ceph/for Ceph-specific logs - Use the Ceph Dashboard for centralized log viewing