Key Concepts
Consumer Group
Offset
Lag
Coordinator
Required Permissions
| Action | Permission |
|---|---|
| View consumer groups | iam:project:infrastructure:kafka:read |
| Reset offsets | iam:project:infrastructure:kafka:write |
| Delete consumer groups | iam:project:infrastructure:kafka:delete |
Consumer Group States
| State | Description |
|---|---|
| Stable | Group is active with assigned partitions. Members are consuming normally. |
| Empty | Group exists but has no active members. Offsets are still retained. |
| PreparingRebalance | Group is preparing to reassign partitions due to member changes. |
| CompletingRebalance | Group is finalizing partition assignments after rebalance. |
| Dead | Group has been deleted or expired due to inactivity. |
How to View Consumer Groups
Filter by State
How to View Consumer Group Details
The detail page provides three tabs:Overview Tab
Shows subscribed topics and general group information:- Group ID
- Current state
- Member count
- Coordinator broker
Members Tab
Lists all active consumers in the group:- Client ID: Application identifier
- Client Host: IP address or hostname
- Member ID: Unique member identifier assigned by Kafka
- Assigned Partitions: Topic:partition pairs assigned to this member
Offsets Tab
Shows consumption progress for each partition:- Current Offset: Last committed offset
- Log End Offset: Latest message position in the partition
- Lag: Messages waiting to be consumed
Understanding Lag
Lag indicates how far behind a consumer group is from the latest messages.| Lag Level | Meaning | Action |
|---|---|---|
| 0-1,000 | Healthy | Consumers are keeping up |
| 1,000-10,000 | Warning | Monitor closely, may need attention |
| > 10,000 | Critical | Consumers falling behind significantly |
Common Causes of High Lag
- Consumer processing is too slow
- Not enough consumers for the number of partitions
- Consumer application errors causing reprocessing
- Network issues between consumers and brokers
- Unbalanced partition distribution
How to Reset Consumer Offsets
Resetting offsets allows you to reprocess messages or skip ahead.Choose Reset Method
- Earliest: Reprocess all available messages from the beginning
- Latest: Skip to the end, only consume new messages
- Timestamp: Reset to a specific point in time
- Offset: Reset to a specific offset number
Reset Types
| Type | Use Case |
|---|---|
| Earliest | Reprocess all data (disaster recovery, data replay) |
| Latest | Skip backlog, only process new messages |
| Timestamp | Replay from a specific point in time |
| Offset | Precise control for specific partition offsets |
How to Delete a Consumer Group
Troubleshooting
Consumer group stuck in rebalancing
Consumer group stuck in rebalancing
- A consumer may be taking too long to process messages
- Check for consumer application errors or crashes
- Increase
session.timeout.msandheartbeat.interval.ms - Reduce
max.poll.recordsif processing is slow - Check network connectivity between consumers and brokers
Consumer lag keeps increasing
Consumer lag keeps increasing
- Add more consumer instances (up to partition count)
- Optimize message processing in your application
- Increase partition count to enable more parallelism
- Check if consumers are committing offsets correctly
- Verify no consumer application errors
Cannot delete consumer group
Cannot delete consumer group
- Consumer group must be Empty or Dead, not Stable
- Stop all consumer applications first
- Wait for heartbeat timeout (default 10 seconds)
- Verify no lingering consumer connections
Cannot reset offsets
Cannot reset offsets
- Consumer group must not be in Stable state
- Stop all consumers before resetting
- You need write permission
- Verify the topic and partitions exist
Offsets show as 0 or negative
Offsets show as 0 or negative
- Consumer group may not have committed offsets yet
- Consumer may be using manual offset management
- Topic may have been recreated
- Check if
enable.auto.commitis configured correctly
Consumer group appears and disappears
Consumer group appears and disappears
- Consumers may be using dynamic group membership
- Short
session.timeout.mscauses frequent disconnects - Application may be crashing and restarting
- Check for resource constraints (memory, CPU) on consumer hosts
Partitions not evenly distributed
Partitions not evenly distributed
- Default partition assignment strategies may not balance evenly
- Consider using
StickyAssignororCooperativeStickyAssignor - Ensure consumer count doesn’t exceed partition count
- Check if some consumers have different subscription patterns
FAQ
How many consumers should I have?
How many consumers should I have?
What happens when a consumer crashes?
What happens when a consumer crashes?
session.timeout.ms (default 10 seconds), the coordinator marks the consumer as dead and triggers a rebalance. Partitions are reassigned to remaining consumers. Some messages may be reprocessed depending on when offsets were last committed.Should I use auto-commit or manual commit?
Should I use auto-commit or manual commit?
What is a rebalance and why does it happen?
What is a rebalance and why does it happen?
How long are offsets retained?
How long are offsets retained?
offsets.retention.minutes (default 7 days). If a consumer group is inactive for longer than this period, all offsets are deleted. Consumers will restart from auto.offset.reset position.Can I have multiple consumer groups reading the same topic?
Can I have multiple consumer groups reading the same topic?
What's the difference between lag and consumer lag?
What's the difference between lag and consumer lag?
How do I handle message reprocessing after offset reset?
How do I handle message reprocessing after offset reset?
Best Practices
Consumer Configuration
Monitoring Strategy
Monitor these metrics continuously:- Consumer lag per partition (should be stable or decreasing)
- Rebalance frequency (frequent rebalances indicate issues)
- Commit rate (should match processing rate)
- Consumer group state (should be Stable during normal operation)
High Availability
- Run at least 2 consumers per group for failover
- Don’t exceed partition count with consumers
- Use rack-aware consumer assignment in multi-datacenter setups
- Configure appropriate timeouts for your network environment
Offset Management
- Commit offsets after successful processing, not before
- Use synchronous commits for critical data
- Consider storing offsets externally for exactly-once semantics
- Implement dead letter queues for failed messages