Key Concepts
Cluster Logs
Operational messages from Ceph daemons including health changes, OSD events, and monitor activity.
Audit Logs
Security-relevant events tracking who performed what actions on the cluster.
Alerts
Prometheus alerts triggered by threshold violations or abnormal conditions.
Severity Levels
Classification of events by importance: Critical, Error, Warning, Info, Debug.
Required Permissions
| Action | Permission |
|---|---|
| View Cluster Logs | iam:project:infrastructure:ceph:logs |
| View Audit Logs | iam:project:infrastructure:ceph:logs |
| View Alerts | iam:project:infrastructure:ceph:read |
Event Types
Cluster Logs
Operational logs from Ceph components:| Level | Description |
|---|---|
| ERROR | Critical issues requiring immediate attention |
| WARNING | Potential problems or degraded conditions |
| INFO | Normal operational messages |
| DEBUG | Detailed diagnostic information |
Audit Logs
Security and compliance logs tracking:- User authentication events
- Configuration changes
- Administrative operations
- Access control modifications
Prometheus Alerts
Active alerts from the monitoring system:| Severity | Description |
|---|---|
| CRITICAL | Severe issues requiring immediate action |
| WARNING | Conditions that may become critical if not addressed |
| INFO | Informational alerts for awareness |
| State | Description |
|---|---|
| active | Alert is currently firing |
| suppressed | Alert is silenced or inhibited |
How to View Events
Select Cluster
Choose a Ceph cluster from the cluster dropdown. Only ready (bootstrapped) clusters will show events.
Select Event Type
Choose a tab to view specific event types:
- Cluster Logs: Operational messages from Ceph daemons
- Audit Logs: Security and compliance events
- Alerts: Active Prometheus alerts
Review Statistics
The summary cards show:
- Total Events/Alerts: Count of all events in the current view
- Critical/Errors: Critical alerts or error logs (pulsing if > 0)
- Warnings: Warning-level events (pulsing if > 0)
- Info: Informational events
- Status: Overall status based on severity counts
How to View Cluster Logs
Review Log Entries
Each log entry shows:
- Timestamp: When the event occurred
- Level: Severity (ERROR, WARNING, INFO, DEBUG)
- Source: Component that generated the log
- Message: Detailed event description
Filter by Level
Click the level filter to show only specific severity levels. Multiple levels can be selected.
How to View Audit Logs
Review Audit Entries
Audit logs capture:
- Administrative commands executed
- Configuration changes
- User activities
- Access control events
How to View Alerts
Review Alert Details
Each alert shows:
- Started: When the alert began firing
- Severity: Critical, Warning, or Info
- Alert: Name of the alert rule
- Summary: Brief description of the issue
- Description: Detailed explanation
- State: Active or Suppressed
Statistics Cards
Total Events/Alerts
The total count of events in the currently selected view (Cluster Logs, Audit Logs, or Alerts).Critical/Errors
Count of critical alerts or error-level logs. A pulsing indicator appears when this count is greater than zero, indicating issues requiring attention.Warnings
Count of warning-level events. A pulsing indicator appears when warnings exist.Info
Count of informational events for general awareness.Status
Overall status based on event counts:- Critical: Red indicator when critical/error events exist
- Warning: Amber indicator when warnings exist (no criticals)
- Healthy: Green indicator when no warnings or errors
Log Entry Fields
Cluster and Audit Logs
| Field | Description |
|---|---|
| Timestamp | Date and time the event occurred |
| Level | Severity: ERROR, WARNING, INFO, DEBUG |
| Source | Ceph component that generated the event |
| Message | Detailed description of the event |
Alerts
| Field | Description |
|---|---|
| Started | When the alert started firing |
| Severity | Alert importance: CRITICAL, WARNING, INFO |
| Alert | Name of the Prometheus alert rule |
| Summary | Brief description of the condition |
| Description | Detailed explanation and context |
| State | Current alert state: active, suppressed |
Common Alert Types
| Alert | Severity | Description |
|---|---|---|
| CephHealthError | Critical | Cluster health is HEALTH_ERR |
| CephHealthWarning | Warning | Cluster health is HEALTH_WARN |
| CephOSDDown | Warning | One or more OSDs are down |
| CephOSDNearFull | Warning | OSD approaching full capacity |
| CephOSDFull | Critical | OSD is full, writes blocked |
| CephPGNotScrubbed | Warning | PGs haven’t been scrubbed recently |
| CephMonQuorumAtRisk | Warning | Monitor quorum may be lost |
| CephMgrModuleCrash | Warning | Manager module has crashed |
Troubleshooting
No events showing
No events showing
- Verify the cluster is bootstrapped and ready
- Check that logging is enabled in Ceph configuration
- Ensure you have the required permissions
- Try refreshing the page
Missing audit logs
Missing audit logs
- Audit logging must be enabled in Ceph configuration
- Check if
mgr/dashboard/AUDIT_API_ENABLEDis true - Verify the cluster dashboard is accessible
Alerts not appearing
Alerts not appearing
- Prometheus must be deployed in the cluster
- Alertmanager must be configured
- Check Prometheus connectivity
- Verify alert rules are defined
Too many events to review
Too many events to review
- Use level/severity filters to focus on important events
- Search for specific keywords
- Review critical and error events first
- Consider setting up automated alerting
Alert stuck in active state
Alert stuck in active state
- The underlying condition may still exist
- Check cluster health on the Clusters page
- Investigate the specific issue mentioned in the alert
- Alerts resolve automatically when conditions clear
Events showing incorrect timestamps
Events showing incorrect timestamps
- Check time synchronization (NTP) on cluster nodes
- Clock skew can affect event ordering
- Verify system timezone settings
FAQ
What's the difference between cluster logs and audit logs?
What's the difference between cluster logs and audit logs?
Cluster Logs: Operational messages about cluster health, daemon status, and system events. Used for troubleshooting and monitoring.Audit Logs: Security-focused logs tracking administrative actions, configuration changes, and user activities. Used for compliance and security audits.
How long are logs retained?
How long are logs retained?
Log retention depends on your Ceph configuration:
- Default retention varies by Ceph version
- The dashboard shows recent logs from the cluster
- For long-term retention, export logs to external systems
Can I export events for analysis?
Can I export events for analysis?
Currently, events cannot be directly exported from this interface. For log aggregation:
- Configure rsyslog or journald forwarding
- Use Loki or similar log aggregation systems
- Set up Prometheus remote write for alerts
What does a suppressed alert mean?
What does a suppressed alert mean?
Suppressed alerts are:
- Silenced by an administrator
- Inhibited by another alert rule
- Temporarily muted during maintenance
How quickly do events appear?
How quickly do events appear?
- Cluster logs appear in near real-time
- Audit logs appear as actions are performed
- Alerts appear when Prometheus evaluates rules (typically every 15-60 seconds)
Why are some log sources showing as 'unknown'?
Why are some log sources showing as 'unknown'?
Some events may not have a source identified when:
- The source daemon information is unavailable
- The event came from a system-level process
- The log format doesn’t include source details
How do I acknowledge or silence alerts?
How do I acknowledge or silence alerts?
Alert management (silencing, acknowledgment) is handled through:
- Prometheus Alertmanager directly
- Ceph Dashboard’s native alert management
- Integration with incident management systems
What triggers the pulsing indicator on cards?
What triggers the pulsing indicator on cards?
The pulsing animation indicates active issues:
- Critical/Errors card: Pulses when count > 0
- Warnings card: Pulses when count > 0