RAFT Consensus Plugin
The RAFT consensus plugin ships with GrADyS-SIM NextGen and provides a fault-tolerant coordination layer for protocols that need to agree on shared values. This guide walks through the essentials for configuring the plugin, wiring it into a protocol, and validating the showcase scenario.
Key Capabilities
- Fault-Tolerant Consensus: Robust leader election with automatic recovery
- Active Node Discovery: Dynamic cluster size detection for accurate majority calculations
- Heartbeat-Based Failure Detection: Automatic detection of failed nodes
- Massive Failure Recovery: Recovers from scenarios where massive quantity of nodes fail
- Scalable Architecture: Works with any number of nodes (10, 50, 100+)
- GradySim Integration: Seamless integration with GrADyS-SIM NextGen simulation framework
- No Log Replication: Lightweight implementation focused on consensus values
- Dynamic Majority Calculation: Thresholds adapt to actual active nodes
- Dual Operation Modes: Classic Raft and Fault-Tolerant modes
- Dispatcher Integration: Automatic handling of RAFT-specific packets and timers
- Optional Statistics: State inspection helpers for debugging
Quick Start
Basic Setup
from gradysim.protocol.plugin.raft import RaftConfig, RaftMode, RaftConsensusPlugin
class RaftProtocol(IProtocol):
def initialize(self) -> None:
# Configure consensus
config = RaftConfig()
config.set_election_timeout(150, 300) # 150-300ms election timeout
config.set_heartbeat_interval(50) # 50ms heartbeat interval
config.add_consensus_variable("sequence", int)
config.add_consensus_variable("leader_position", str)
config.set_logging(enable=True, level="INFO")
# Choose operation mode
config.set_raft_mode(RaftMode.FAULT_TOLERANT) # Default: fault-tolerant mode
# config.set_raft_mode(RaftMode.CLASSIC) # Alternative: classic mode
# Initialize the Raft consensus plugin
self.consensus = RaftConsensusPlugin(config=config, protocol=self)
# Set known nodes and start consensus
self.consensus.set_known_nodes(list(range(10)))
self.consensus.start()
def handle_timer(self, timer: str) -> None:
# Non RAFT timers continue to arrive here
if timer == "app_timer":
self.provider.schedule_timer("app_timer", self.provider.current_time() + 1)
def handle_telemetry(self, telemetry: Telemetry) -> None:
# Propose values if leader
if self.consensus.is_leader():
self.consensus.propose_value("sequence", int(telemetry.simulation_time))
self.consensus.propose_value("leader_position", "north")
def finish(self) -> None:
self.consensus.stop()
Advanced Usage with Failure Detection
def initialize(self) -> None:
config = RaftConfig()
config.set_election_timeout(150, 300)
config.set_heartbeat_interval(50)
config.add_consensus_variable("sequence", int)
config.set_raft_mode(RaftMode.FAULT_TOLERANT)
# Configure failure detection parameters
failure_config = config.get_failure_config()
failure_config.set_failure_threshold(2) # 2 failed heartbeats to mark as failed
failure_config.set_recovery_threshold(3) # 3 successful heartbeats to recover
failure_config.set_detection_interval(2) # Check every 2 heartbeats
failure_config.set_heartbeat_timeout(4) # 4× heartbeat_interval = 200ms timeout
self.consensus = RaftConsensusPlugin(config=config, protocol=self)
self.consensus.set_known_nodes(list(range(10)))
self.consensus.start()
Dispatcher Integration
The plugin automatically prefixes every internal message and timer with __RAFT__
when talking to the provider. Dispatcher hooks intercept any packet or timer that carries this prefix and stop it from reaching the protocol. All other events pass through unchanged, so protocol logic stays clean.
Architecture Overview
Component Relationships
RaftConsensus RaftNode GradySim
(Facade) ◄──► (Core Logic) ◄──► (Platform)
│ │ │
▼ ▼ ▼
RaftConfig HeartbeatDetector Simulation
(Settings) (Internal) Framework
Component Relationships:
-
RaftConsensus: Public facade for consensus operations
-
RaftNode: Core Raft algorithm implementation
-
GradySim: Simulation framework integration
-
RaftConfig: Configuration and settings management
-
HeartbeatDetector: Internal failure detection (Fault-Tolerant mode only)
Message Flow
- Consensus Request:
RaftConsensus.propose_value()
- Internal Processing:
RaftNode
processes the request - Platform Communication: Uses GradySim provider to send messages
- Response Handling: Processes responses via dispatcher
- State Update: Updates consensus state and notifies callers
Key Innovations
1. Active Node Discovery System
The RAFT plugin introduces a discovery phase before elections that solves the fundamental problem of unknown cluster size after failures:
# Traditional Raft: Uses fixed cluster size
majority = (total_nodes // 2) + 1 # Always uses total_nodes
# RAFT Plugin: Discovers actual active nodes
discovered_active = discover_active_nodes() # Dynamic discovery
majority = (discovered_active // 2) + 1 # Uses actual active nodes
Benefits:
-
Accurate majority calculations after failures
-
Faster leader election with correct thresholds
-
Automatic adaptation to cluster changes
-
No manual configuration needed
2. Massive Failure Recovery
The plugin can recover from scenarios that would break traditional Raft implementations:
# Scenario: 10 nodes, 8 fail (including leader)
# Traditional Raft: Stuck - can't calculate majority
# RAFT Plugin: Discovers 2 active nodes, calculates majority = 2, elects leader
3. Dynamic Cluster Adaptation
The system automatically adapts to cluster changes:
# Initial: 10 nodes active
# After failures: 3 nodes active
# RAFT Plugin automatically adjusts majority threshold from 6 to 2
4. Dispatcher Integration
The plugin automatically handles RAFT-specific communication without affecting your protocol:
# The plugin automatically prefixes internal messages and timers with "__RAFT__"
# Your protocol only sees non-RAFT events
def handle_timer(self, timer: str) -> None:
# RAFT timers are automatically intercepted
# Only your application timers reach here
if timer == "app_timer":
self.provider.schedule_timer("app_timer", self.provider.current_time() + 1)
Configuring the Plugin
Election and Heartbeat
set_election_timeout(min_ms, max_ms)
:
random election timeout window. Typical values are 150-300 ms.
set_heartbeat_interval(interval_ms)
:
heartbeat cadence. Keep it well below the election timeout (for example 50 ms).
Consensus Variables
Declare everything that must reach agreement before starting the consensus:
config.add_consensus_variable("sequence", int)
config.add_consensus_variable("formation_info", dict)
config.add_consensus_variable("leader_position", str)
config.add_consensus_variable("temperature", float)
config.add_consensus_variable("is_active", bool)
Supported Variable Types:
-
Primitive Types:
int
,float
,str
,bool
-
Complex Types:
dict
,list
,tuple
-
None Values:
None
is also supported -
Custom Objects: Any object that can be JSON serialized
Committed values are available through get_committed_value(name)
or get_all_committed_values()
.
Operating Modes
The RAFT plugin supports two distinct operation modes, allowing you to choose between classic Raft behavior and fault-tolerant enhancements:
Classic Raft Mode (RaftMode.CLASSIC
)
Classic Raft Mode implements the standard Raft protocol as specified by Ongaro and Ousterhout. The only divergence in our system is replication: the plugin targets agreement on consensus values rather than maintaining a replicated log.
Characteristics:
-
Fixed Cluster Size: Uses total known nodes for all majority calculations
-
Direct Elections: No discovery phase - immediate election on timeout
-
Standard Behavior: Classic Raft consensus semantics
-
Better Performance: Lower overhead (no discovery phase)
-
Deterministic: Predictable behavior for static clusters
-
Manual Failure Handling: Requires manual cluster reconfiguration after failures
Use Cases:
-
Static clusters with fixed, known node count
-
Environments where performance is critical
-
Systems requiring standard Raft semantics
-
Scenarios with manual failure management
Fault-Tolerant Raft Mode (RaftMode.FAULT_TOLERANT
)
Enhanced Raft implementation with automatic fault tolerance and dynamic cluster adaptation.
Characteristics:
-
Dynamic Active Node Detection: Discovery phase before elections
-
Adaptive Majority Calculation: Uses active node count for thresholds
-
Massive Failure Recovery: Automatic recovery from massive quantity of node failures
-
Self-Healing: Automatic adaptation to cluster changes
-
Fault Tolerance: No manual intervention needed
-
Active Node Information: Real-time access to node information
-
Discovery Overhead: Small performance cost for discovery phase
Use Cases:
-
Dynamic environments with node failures
-
Distributed systems requiring high availability
-
Scenarios with unpredictable node failures
-
Systems that need automatic fault recovery
Mode Comparison
Feature | Classic Mode | Fault-Tolerant Mode |
---|---|---|
Discovery Phase | ❌ Disabled | ✅ Enabled |
Majority Calculation | Fixed (total nodes) | Dynamic (active nodes) |
Failure Recovery | ❌ Manual | ✅ Automatic |
Performance | ✅ Faster | ⚡ Slight overhead |
Massive Failures | ❌ Requires intervention | ✅ Auto-recovery |
Active Node Info | ❌ Not available | ✅ Available |
Use Case | Static clusters | Dynamic clusters |
Standard Raft | ✅ Compliant | Enhanced |
Configuration Examples
Classic Raft Mode Configuration:
config = RaftConfig()
config.set_raft_mode(RaftMode.CLASSIC)
config.set_election_timeout(150, 300)
config.set_heartbeat_interval(50)
# Classic mode: Uses total known nodes for majority
# No discovery phase - direct elections
# Standard Raft semantics
Fault-Tolerant Raft Mode Configuration:
config = RaftConfig()
config.set_raft_mode(RaftMode.FAULT_TOLERANT) # Default mode
config.set_election_timeout(200, 400) # Longer timeouts for discovery
config.set_heartbeat_interval(50)
# Fault-tolerant mode: Uses active node discovery
# Dynamic majority calculations
# Automatic fault recovery
Runtime Mode Detection
# Check current mode
if config.is_classic_mode():
print("Running in Classic Raft mode")
# No discovery, standard Raft behavior
elif config.is_fault_tolerant_mode():
print("Running in Fault-Tolerant Raft mode")
# Discovery enabled, dynamic fault tolerance
# Get mode programmatically
current_mode = config.get_raft_mode()
print(f"Current mode: {current_mode.value}")
Choosing the Right Mode
Choose Classic Mode When: - Cluster size is fixed and known - Performance is critical - Standard Raft behavior is required - Manual failure management is acceptable
Choose Fault-Tolerant Mode When: - Nodes may fail unexpectedly - Automatic fault recovery is needed - Cluster size varies dynamically - High availability is required - Active node information is needed
Switch modes with config.set_raft_mode(RaftMode.CLASSIC)
or config.set_raft_mode(RaftMode.FAULT_TOLERANT)
.
Failure Detection
config.get_failure_config()
exposes thresholds for the heartbeat detector used in fault tolerant mode:
failure = config.get_failure_config()
failure.set_failure_threshold(2) # 2 failed heartbeats to mark as failed
failure.set_recovery_threshold(3) # 3 successful heartbeats to recover
failure.set_detection_interval(2) # Check every 2 heartbeats
failure.set_heartbeat_timeout(4) # 4 * heartbeat interval
Failure Detection Configuration Options:
# Conservative Detection (High reliability, slower detection)
failure.set_failure_threshold(5) # 5 consecutive failures
failure.set_recovery_threshold(3) # 3 consecutive successes
failure.set_detection_interval(3) # Check every 3 heartbeats
failure.set_heartbeat_timeout(6) # 6× heartbeat_interval = 300ms timeout
# Aggressive Detection (Fast detection, may have false positives)
failure.set_failure_threshold(2) # 2 consecutive failures
failure.set_recovery_threshold(1) # 1 consecutive success
failure.set_detection_interval(1) # Check every heartbeat
failure.set_absolute_timeout(100) # 100ms absolute timeout
# Balanced Detection (Recommended default)
failure.set_failure_threshold(3) # 3 consecutive failures
failure.set_recovery_threshold(2) # 2 consecutive successes
failure.set_detection_interval(2) # Check every 2 heartbeats
failure.set_heartbeat_timeout(4) # 4× heartbeat_interval = 200ms timeout
Logging
The RAFT plugin provides comprehensive logging capabilities for debugging and monitoring consensus behavior. Logging is configured per node and can be customized based on your needs.
Basic Logging Configuration
# Enable logging with default INFO level
config.set_logging(enable=True)
# Enable logging with specific level
config.set_logging(enable=True, level="DEBUG")
config.set_logging(enable=True, level="INFO")
config.set_logging(enable=True, level="WARNING")
config.set_logging(enable=True, level="ERROR")
# Disable logging (only errors will be shown)
config.set_logging(enable=False)
Available Log Levels
Level | Description | Use Case |
---|---|---|
DEBUG |
Detailed information for diagnosing problems | Development and debugging |
INFO |
General information about consensus operations | Normal operation monitoring |
WARNING |
Warning messages for potential issues | Production monitoring |
ERROR |
Error messages for serious problems | Error tracking |
Logging Examples
Development and Debugging:
config = RaftConfig()
config.set_logging(enable=True, level="DEBUG")
# This will show:
# - Detailed election processes
# - Message exchanges
# - Timer events
# - State transitions
# - Failure detection details
Production Monitoring:
config = RaftConfig()
config.set_logging(enable=True, level="INFO")
# This will show:
# - Election results
# - Leadership changes
# - Consensus value commits
# - Major state changes
Minimal Logging:
config = RaftConfig()
config.set_logging(enable=True, level="WARNING")
# This will show:
# - Only warnings and errors
# - Critical consensus issues
# - Failure scenarios
Silent Mode:
config = RaftConfig()
config.set_logging(enable=False)
# This will show:
# - Only critical errors
# - System failures
Log Output Examples
When logging is enabled, you'll see output like:
INFO:RaftConsensus-1: RaftConsensus initialized for node 1
INFO:RaftNode-1: Starting Raft consensus for node 1
INFO:RaftNode-1: Node 1 became candidate in term 1
INFO:RaftNode-1: Node 1 became leader in term 1
INFO:RaftNode-1: Proposing value for variable 'sequence': 42
INFO:RaftNode-1: Value committed for variable 'sequence': 42
Logger Names
The plugin creates separate loggers for different components:
RaftConsensus-{node_id}
: Main consensus facade loggerRaftNode-{node_id}
: Internal node implementation logger
This allows you to configure different log levels for different components if needed.
Advanced Logging Configuration
You can also configure logging at the Python logging level:
import logging
# Configure root logger
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# Configure specific RAFT loggers
logging.getLogger("RaftConsensus").setLevel(logging.INFO)
logging.getLogger("RaftNode").setLevel(logging.DEBUG)
Working with the Consensus Instance
Basic Operations
Leadership and State:
- is_leader()
- Check if this node is the current leader
- get_leader_id()
- Get the current leader ID
- get_current_state()
- Get current Raft state (FOLLOWER, CANDIDATE, LEADER)
- get_current_term()
- Get current term number
Value Management:
- propose_value(name, value)
- Propose new value (leader only)
- get_committed_value(name)
- Get committed value for a variable
- get_all_committed_values()
- Get all committed values
Node Management:
- set_simulation_active(node_id, bool)
- Set node as active/inactive in simulation
- is_simulation_active(node_id)
- Check if node is active in simulation
- get_failed_nodes()
- Get failed nodes (fault tolerant mode)
- get_active_nodes()
- Get active nodes (fault tolerant mode)
Monitoring and Debugging:
- get_statistics()
- Get consensus statistics and counters
- get_state_info()
- Get detailed state information
- get_active_nodes_info()
- Get detailed active node information (fault tolerant mode)
Method Usage Examples
def handle_telemetry(self, telemetry: Telemetry) -> None:
# Check if we're the leader before proposing
if self.consensus.is_leader():
# Propose new values
success = self.consensus.propose_value("sequence", int(telemetry.simulation_time))
if success:
print("✅ Value proposed successfully")
else:
print("❌ Failed to propose value")
# Propose multiple values
self.consensus.propose_value("leader_position", "north")
self.consensus.propose_value("temperature", 25.5)
# All nodes can retrieve committed values
sequence_value = self.consensus.get_committed_value("sequence")
position_value = self.consensus.get_committed_value("leader_position")
if sequence_value is not None:
print(f"Current sequence: {sequence_value}")
# Get all committed values
all_values = self.consensus.get_all_committed_values()
print(f"All committed values: {all_values}")
# Check consensus state
is_leader = self.consensus.is_leader()
leader_id = self.consensus.get_leader_id()
current_term = self.consensus.get_current_term()
print(f"Leader: {is_leader}, Leader ID: {leader_id}, Term: {current_term}")
Active Node Information (Fault-Tolerant Mode)
def handle_telemetry(self, telemetry: Telemetry) -> None:
# Get detailed information about active nodes
active_info = self.consensus.get_active_nodes_info()
print(f"Active nodes: {active_info['active_nodes']}")
print(f"Total nodes: {active_info['total_known']}")
print(f"Failed nodes: {active_info['failed_nodes']}")
print(f"Has majority: {active_info['has_majority']}")
print(f"Detection method: {active_info['detection_method']}")
# Check if we have quorum
if self.consensus.has_quorum():
print("✅ Cluster has quorum")
else:
print("❌ Cluster lacks quorum")
# Check specific nodes
if self.consensus.is_node_failed(3):
print("Node 3 is currently failed")
if self.consensus.is_simulation_active(5):
print("Node 5 is active in simulation")
Node Failure Simulation
def handle_timer(self, timer: str) -> None:
if timer == "failure_simulation":
# Simulate failure of specific nodes
nodes_to_fail = [1, 2, 3, 4, 5]
for node_id in nodes_to_fail:
self.consensus.set_simulation_active(node_id, False)
print(f"Simulated failure of nodes: {nodes_to_fail}")
elif timer == "recovery_simulation":
# Simulate recovery of specific nodes
nodes_to_recover = [1, 2, 3, 4, 5]
for node_id in nodes_to_recover:
self.consensus.set_simulation_active(node_id, True)
print(f"Simulated recovery of nodes: {nodes_to_recover}")
# Forward other timers to consensus
if self.consensus:
self.consensus.handle_timer(timer)
Always stop the plugin with consensus.stop()
during finish()
to clean up timers.
Running the Showcase
The repository ships with a ready-to-run demonstration under showcases/raft
. It sets up 40 nodes and exercises failure and recovery flows.
This command loads showcases/raft/protocol.py
, which already uses RaftConsensusPlugin
directly. Monitor the console output to track leadership changes, proposed values, and simulated failures.
Best Practices
Configuration Guidelines
Election Timeouts
# For stable networks
config.set_election_timeout(150, 300) # 150-300ms
# For unstable networks
config.set_election_timeout(200, 400) # 200-400ms
# For very unstable networks
config.set_election_timeout(300, 600) # 300-600ms
Heartbeat Intervals
# Standard interval
config.set_heartbeat_interval(50) # 50ms
# For high-frequency updates
config.set_heartbeat_interval(25) # 25ms
# For low-frequency updates
config.set_heartbeat_interval(100) # 100ms
Failure Detection Configuration
# Conservative Detection (High reliability, slower detection)
failure_config.set_failure_threshold(5) # 5 consecutive failures
failure_config.set_recovery_threshold(3) # 3 consecutive successes
failure_config.set_detection_interval(3) # Check every 3 heartbeats
failure_config.set_heartbeat_timeout(6) # 6× heartbeat_interval = 300ms timeout
# Aggressive Detection (Fast detection, may have false positives)
failure_config.set_failure_threshold(2) # 2 consecutive failures
failure_config.set_recovery_threshold(1) # 1 consecutive success
failure_config.set_detection_interval(1) # Check every heartbeat
failure_config.set_absolute_timeout(100) # 100ms absolute timeout
# Balanced Detection (Recommended default)
failure_config.set_failure_threshold(3) # 3 consecutive failures
failure_config.set_recovery_threshold(2) # 2 consecutive successes
failure_config.set_detection_interval(2) # Check every 2 heartbeats
failure_config.set_heartbeat_timeout(4) # 4× heartbeat_interval = 200ms timeout
Mode Selection
Use Classic Mode When:
- Cluster size is fixed and known
- Performance is critical
- Standard Raft behavior is required
- Manual failure management is acceptable
Use Fault-Tolerant Mode When:
- Nodes may fail unexpectedly
- Automatic fault recovery is needed
- Cluster size varies dynamically
- High availability is required
- Active node information is needed
Error Handling
# Always check if consensus is ready
if self.consensus.is_ready():
# Proceed with operations
pass
# Handle leader-only operations
if self.consensus.is_leader():
success = self.consensus.propose_value("var", value)
if not success:
print("Failed to propose value")
# Check quorum before critical operations
if self.consensus.has_quorum():
# Proceed with critical operation
pass
else:
print("No quorum available")
Troubleshooting
Common Issues
1. No Leader Elected
# Check if nodes are properly configured
print(f"Known nodes: {self.consensus.get_known_nodes()}")
print(f"Active nodes: {self.consensus.get_active_nodes()}")
# Check if cluster has quorum
if self.consensus.has_quorum():
print("Cluster has quorum")
else:
print("Cluster lacks quorum - add more nodes")
2. Values Not Committing
# Check if we're the leader
if not self.consensus.is_leader():
print("Not the leader - cannot propose values")
# Check if we have quorum
if not self.consensus.has_quorum():
print("No quorum - cannot commit values")
# Check consensus variables
variables = self.consensus.get_consensus_variables()
print(f"Available variables: {list(variables.keys())}")
3. High Election Frequency
# Increase election timeout
config.set_election_timeout(300, 600) # Longer timeouts
# Increase heartbeat frequency
config.set_heartbeat_interval(25) # More frequent heartbeats
# Check network stability
failed_nodes = self.consensus.get_failed_nodes()
print(f"Failed nodes: {failed_nodes}")
⚠️ Known Issue: Fault Tolerance Communication Range
Important Warning: When the RAFT plugin is operating in Fault-Tolerant Mode (RaftMode.FAULT_TOLERANT
) and the communication range is not large enough to reach all nodes in the swarm, the system may not converge to stability and consecutive elections may be triggered repeatedly.
Problem Description: - In Fault-Tolerant mode, nodes perform active node discovery before elections - When communication range is limited, the network may form multiple disconnected clusters - Edge nodes that can be reached by multiple clusters simultaneously become a critical issue - These edge nodes may receive conflicting election requests from different clusters - This leads to incorrect majority calculations and failed leader elections - The system may enter a cycle of repeated elections without reaching consensus - Most likely cause: Edge nodes being simultaneously accessible by multiple clusters, creating conflicting consensus states
Current Status: This is an open research problem for which a complete solution is not yet known. The issue occurs in distributed systems where: - Nodes have limited communication range - The network topology creates disconnected components - The consensus algorithm requires global knowledge for proper operation
Impact: - ❌ System may not converge to a stable leader - ❌ Consecutive election cycles may occur - ❌ Consensus values may not be committed - ❌ Performance degradation due to constant re-elections
Workarounds:
1. Increase Communication Range: Ensure all nodes can communicate with each other
2. Use Classic Mode: Switch to RaftMode.CLASSIC
for scenarios with limited communication range
3. Network Topology Design: Design network topology to ensure full connectivity
Call for Contributions: This remains an open research challenge in distributed consensus algorithms. We invite researchers and developers to:
- Research Solutions: Investigate novel approaches to consensus in limited-range networks
- Propose Algorithms: Develop new consensus algorithms for disconnected networks
- Test Scenarios: Create test cases and benchmarks for this problem
- Document Solutions: Share findings and potential solutions
- Collaborate: Work together to find a robust solution
Contact for Contributions: If you have ideas, solutions, or want to contribute to solving this problem, please contact: - Email: llucchesi@inf.puc-rio.br - Subject: "RAFT Plugin - Fault Tolerance Communication Range Issue"
Debugging Tips
Enable Logging
Monitor Node States
# Get detailed state information
state_info = self.consensus.get_state_info()
print(f"Current state: {state_info['state']}")
print(f"Current term: {state_info['current_term']}")
print(f"Leader ID: {state_info['leader_id']}")
Check Network Connectivity
# In fault-tolerant mode, check active nodes
active_info = self.consensus.get_active_nodes_info()
print(f"Active nodes: {active_info['active_nodes']}")
print(f"Failed nodes: {active_info['failed_nodes']}")
# Check majority calculation
majority_info = self.consensus.get_majority_info()
print(f"Majority needed: {majority_info['majority_needed']}")
print(f"Has majority: {majority_info['has_majority']}")
Performance Monitoring
# Get consensus statistics
stats = self.consensus.get_statistics()
print(f"Proposals made: {stats['proposals_made']}")
print(f"Values committed: {stats['values_committed']}")
print(f"Election attempts: {stats['election_attempts']}")
# Get failure detection metrics
metrics = self.consensus.get_failure_detection_metrics()
print(f"Detection latency: {metrics['detection_latency_ms']}ms")
print(f"Success rate: {metrics['success_rate_percent']}%")
API Quick Reference
This section provides a comprehensive reference of all available methods in the RAFT consensus plugin, organized by functionality.
Configuration Methods
Import Required:
RaftConfig Methods
Method | Description | Example |
---|---|---|
set_election_timeout(min_ms, max_ms) |
Set randomized election timeout range | config.set_election_timeout(150, 300) |
set_heartbeat_interval(ms) |
Set leader heartbeat interval | config.set_heartbeat_interval(50) |
add_consensus_variable(name, type) |
Add a consensus variable | config.add_consensus_variable("sequence", int) |
set_logging(enable, level) |
Configure logging | config.set_logging(True, "INFO") |
set_raft_mode(mode) |
Set operation mode | config.set_raft_mode(RaftMode.FAULT_TOLERANT) |
get_failure_config() |
Get failure detection config | failure = config.get_failure_config() |
Lifecycle Methods
Import Required:
Method | Description | Example |
---|---|---|
start() |
Start consensus process | consensus.start() |
stop() |
Stop consensus process | consensus.stop() |
is_ready() |
Check if system is ready | if consensus.is_ready(): |
Leadership and State Methods
Import Required:
Method | Description | Example |
---|---|---|
is_leader() |
Check if this node is leader | if consensus.is_leader(): |
get_leader_id() |
Get current leader ID | leader_id = consensus.get_leader_id() |
get_current_state() |
Get current Raft state | state = consensus.get_current_state() |
get_current_term() |
Get current term number | term = consensus.get_current_term() |
Value Management Methods
Import Required:
Method | Description | Example |
---|---|---|
propose_value(name, value) |
Propose new value (leader only) | consensus.propose_value("sequence", 42) |
get_committed_value(name) |
Get committed value | value = consensus.get_committed_value("sequence") |
get_all_committed_values() |
Get all committed values | values = consensus.get_all_committed_values() |
get_consensus_variables() |
Get configured variables | vars = consensus.get_consensus_variables() |
has_consensus_variable(name) |
Check if variable exists | if consensus.has_consensus_variable("sequence"): |
Node Management Methods
Import Required:
Method | Description | Example |
---|---|---|
set_known_nodes(node_ids) |
Set known node IDs | consensus.set_known_nodes([1, 2, 3, 4, 5]) |
set_simulation_active(node_id, active) |
Set node simulation state | consensus.set_simulation_active(1, False) |
is_simulation_active(node_id) |
Check if node is active | if consensus.is_simulation_active(1): |
get_failed_nodes() |
Get failed nodes | failed = consensus.get_failed_nodes() |
get_active_nodes() |
Get active nodes | active = consensus.get_active_nodes() |
is_node_failed(node_id) |
Check if node failed | if consensus.is_node_failed(1): |
Active Node Information Methods (Fault-Tolerant Mode)
Import Required:
Method | Description | Example |
---|---|---|
get_active_nodes_info() |
Get detailed active node info | info = consensus.get_active_nodes_info() |
get_communication_failed_nodes() |
Get communication failed nodes | failed = consensus.get_communication_failed_nodes() |
get_communication_active_nodes() |
Get communication active nodes | active = consensus.get_communication_active_nodes() |
is_communication_failed(node_id) |
Check communication failure | if consensus.is_communication_failed(1): |
Quorum and Majority Methods
Import Required:
Method | Description | Example |
---|---|---|
has_quorum() |
Check if cluster has quorum | if consensus.has_quorum(): |
has_majority_votes() |
Check if has majority votes | if consensus.has_majority_votes(): |
has_majority_confirmation() |
Check majority confirmation | if consensus.has_majority_confirmation(): |
get_majority_info() |
Get majority information | info = consensus.get_majority_info() |
Monitoring and Debugging Methods
Import Required:
Method | Description | Example |
---|---|---|
get_statistics() |
Get consensus statistics | stats = consensus.get_statistics() |
get_state_info() |
Get detailed state info | info = consensus.get_state_info() |
get_configuration() |
Get current configuration | config = consensus.get_configuration() |
get_failure_detection_metrics() |
Get failure detection metrics | metrics = consensus.get_failure_detection_metrics() |
Cluster Management Methods
Import Required:
Method | Description | Example |
---|---|---|
set_cluster_id(cluster_id) |
Set cluster ID | consensus.set_cluster_id(1) |
get_cluster_id() |
Get cluster ID | id = consensus.get_cluster_id() |
is_in_same_cluster(node_id) |
Check if in same cluster | if consensus.is_in_same_cluster(2): |
Complete Usage Example
from gradysim.protocol.plugin.raft import RaftConfig, RaftMode, RaftConsensusPlugin
class RaftProtocol(IProtocol):
def initialize(self) -> None:
# Configuration
config = RaftConfig()
config.set_election_timeout(150, 300)
config.set_heartbeat_interval(50)
config.add_consensus_variable("sequence", int)
config.set_logging(enable=True, level="INFO")
config.set_raft_mode(RaftMode.FAULT_TOLERANT)
# Initialize consensus
self.consensus = RaftConsensusPlugin(config=config, protocol=self)
self.consensus.set_known_nodes(list(range(10)))
self.consensus.start()
def handle_telemetry(self, telemetry: Telemetry) -> None:
# Leadership and value management
if self.consensus.is_leader():
self.consensus.propose_value("sequence", int(telemetry.simulation_time))
# Get committed values
sequence = self.consensus.get_committed_value("sequence")
if sequence is not None:
print(f"Current sequence: {sequence}")
# Check consensus state
if self.consensus.is_leader():
print(f"Node {self.consensus.get_leader_id()} is leader in term {self.consensus.get_current_term()}")
# Monitor active nodes (fault-tolerant mode)
active_info = self.consensus.get_active_nodes_info()
print(f"Active nodes: {active_info['active_nodes']}")
print(f"Has majority: {active_info['has_majority']}")
# Check quorum
if not self.consensus.has_quorum():
print("Warning: No quorum available")
def finish(self) -> None:
# Always stop consensus
self.consensus.stop()