CentralizedQLearning
Description
Parameters
Name | Type | Unit | Default value | Description |
---|---|---|---|---|
trainingMode | bool | true | Flag that dictates if the module is run in training or testing mode. If set to true the module will use the hyper parameters below and train using a centralized Q learning algorithm. The output of this training is a Q Table exported after the simulation's end. Otherwise the module will import the Q Table and use it to make decisions. |
|
qTablePath | string | Path to the Centralized Q Table. If "trainingMode" is set to true the table is exported to this path, otherwise it is imported from this path. |
||
learningRate | double | 0.1 | HyperParameters for the Centralized Q-Learning algorithm | |
gamma | double | 0.99 | ||
epsilonDecayStrategy | int | 2 | Defines the strategy used to decay epsilon. A decay stategy defines how the epsilon value will decay from 1 to 0. The possible strategies are: 1 = "linear" - In epsilonHorizon training steps epsilon will decay linearly from epsilonStart to epsilonEnd 2 = "exponetial" - In episilonHorizon training steps epsilon will decay exponentially from epsilonStart to epsilonEnd 3 = "steps" - In episilonHorizon training steps the episilon will decay by ((epsilonEnd - epsilonStart) / episilonSteps) every episilonSteps training steps |
|
epsilonStart | double | 1 | ||
epsilonEnd | double | 0.001 | ||
epsilonHorizon | int | 100000 | ||
epsilonSteps | int | 10 | Only relevant to the "steps" strategy | |
epsilonShortCircuit | bool | false | If this boolean is set to true the simulation will automatically end when the epsilon horizon is reached |
|
startFromScratch | bool | true | Determines if this simulation should start from scratch, ignoring any previous QTables, or if it should reuse and add knowledge to one, if available |
|
timeInterval | double | s | 100ms | Parameters for the simulated scenario |
distanceInterval | double | m | 80m | |
communicationStorageInterval | double | 80 | ||
sensorStorageTolerance | double | 80 | The values stored in each of the indexes of the agent's local state are going to be divided by communicationStorageInterval and rounded up. |
|
maxDiscreteAgentPackets | double | 3 | ||
maxDiscreteAwaitingPackets | double | 3 | ||
costFunction | double | 1 | ||
agentWeight | double | 1 | ||
sensorWeight | double | 1 | ||
throughputWeight | double | 1 |