CentralizedQLearning

Description

Parameters

Name	Type	Unit	Default value	Description
trainingMode	bool		true	Flag that dictates if the module is run in training or testing mode. If set to true the module will use the hyper parameters below and train using a centralized Q learning algorithm. The output of this training is a Q Table exported after the simulation's end. Otherwise the module will import the Q Table and use it to make decisions.
qTablePath	string			Path to the Centralized Q Table. If "trainingMode" is set to true the table is exported to this path, otherwise it is imported from this path.
learningRate	double		0.1	HyperParameters for the Centralized Q-Learning algorithm
gamma	double		0.99
epsilonDecayStrategy	int		2	Defines the strategy used to decay epsilon. A decay stategy defines how the epsilon value will decay from 1 to 0. The possible strategies are: 1 = "linear" - In epsilonHorizon training steps epsilon will decay linearly from epsilonStart to epsilonEnd 2 = "exponetial" - In episilonHorizon training steps epsilon will decay exponentially from epsilonStart to epsilonEnd 3 = "steps" - In episilonHorizon training steps the episilon will decay by ((epsilonEnd - epsilonStart) / episilonSteps) every episilonSteps training steps
epsilonStart	double		1
epsilonEnd	double		0.001
epsilonHorizon	int		100000
epsilonSteps	int		10	Only relevant to the "steps" strategy
epsilonShortCircuit	bool		false	If this boolean is set to true the simulation will automatically end when the epsilon horizon is reached
startFromScratch	bool		true	Determines if this simulation should start from scratch, ignoring any previous QTables, or if it should reuse and add knowledge to one, if available
timeInterval	double	s	100ms	Parameters for the simulated scenario
distanceInterval	double	m	80m
communicationStorageInterval	double		80
sensorStorageTolerance	double		80	The values stored in each of the indexes of the agent's local state are going to be divided by communicationStorageInterval and rounded up.
maxDiscreteAgentPackets	double		3
maxDiscreteAwaitingPackets	double		3
costFunction	double		1
agentWeight	double		1
sensorWeight	double		1
throughputWeight	double		1