CentralizedQLearning

Description

Parameters

Name Type Unit Default value Description
trainingMode bool true Flag that dictates if the module is run in training or testing mode.
If set to true the module will use the hyper parameters below and train using
a centralized Q learning algorithm. The output of this training is a Q Table
exported after the simulation's end. Otherwise the module will import the Q Table
and use it to make decisions.
qTablePath string Path to the Centralized Q Table. If "trainingMode" is set to true the table is exported
to this path, otherwise it is imported from this path.
learningRate double 0.1 HyperParameters for the Centralized Q-Learning algorithm
gamma double 0.99
epsilonDecayStrategy int 2 Defines the strategy used to decay epsilon. A decay stategy defines how the epsilon value will decay from 1 to 0.
The possible strategies are:
1 = "linear" - In epsilonHorizon training steps epsilon will decay linearly from epsilonStart to epsilonEnd
2 = "exponetial" - In episilonHorizon training steps epsilon will decay exponentially from epsilonStart to epsilonEnd
3 = "steps" - In episilonHorizon training steps the episilon will decay by ((epsilonEnd - epsilonStart) / episilonSteps) every episilonSteps training steps
epsilonStart double 1
epsilonEnd double 0.001
epsilonHorizon int 100000
epsilonSteps int 10 Only relevant to the "steps" strategy
epsilonShortCircuit bool false If this boolean is set to true the simulation will automatically end when
the epsilon horizon is reached
startFromScratch bool true Determines if this simulation should start from scratch, ignoring any previous QTables, or if it
should reuse and add knowledge to one, if available
timeInterval double s 100ms Parameters for the simulated scenario
distanceInterval double m 80m
communicationStorageInterval double 80
sensorStorageTolerance double 80 The values stored in each of the indexes of the agent's local state are going to be divided
by communicationStorageInterval and rounded up.
maxDiscreteAgentPackets double 3
maxDiscreteAwaitingPackets double 3
costFunction double 1
agentWeight double 1
sensorWeight double 1
throughputWeight double 1