CentralizedQLearning
Description
Parameters
| Name | Type | Unit | Default value | Description |
|---|---|---|---|---|
| trainingMode | bool | true | Flag that dictates if the module is run in training or testing mode. If set to true the module will use the hyper parameters below and train using a centralized Q learning algorithm. The output of this training is a Q Table exported after the simulation's end. Otherwise the module will import the Q Table and use it to make decisions. |
|
| qTablePath | string | Path to the Centralized Q Table. If "trainingMode" is set to true the table is exported to this path, otherwise it is imported from this path. |
||
| learningRate | double | 0.1 | HyperParameters for the Centralized Q-Learning algorithm | |
| gamma | double | 0.99 | ||
| epsilonDecayStrategy | int | 2 | Defines the strategy used to decay epsilon. A decay stategy defines how the epsilon value will decay from 1 to 0. The possible strategies are: 1 = "linear" - In epsilonHorizon training steps epsilon will decay linearly from epsilonStart to epsilonEnd 2 = "exponetial" - In episilonHorizon training steps epsilon will decay exponentially from epsilonStart to epsilonEnd 3 = "steps" - In episilonHorizon training steps the episilon will decay by ((epsilonEnd - epsilonStart) / episilonSteps) every episilonSteps training steps |
|
| epsilonStart | double | 1 | ||
| epsilonEnd | double | 0.001 | ||
| epsilonHorizon | int | 100000 | ||
| epsilonSteps | int | 10 | Only relevant to the "steps" strategy | |
| epsilonShortCircuit | bool | false | If this boolean is set to true the simulation will automatically end when the epsilon horizon is reached |
|
| startFromScratch | bool | true | Determines if this simulation should start from scratch, ignoring any previous QTables, or if it should reuse and add knowledge to one, if available |
|
| timeInterval | double | s | 100ms | Parameters for the simulated scenario |
| distanceInterval | double | m | 80m | |
| communicationStorageInterval | double | 80 | ||
| sensorStorageTolerance | double | 80 | The values stored in each of the indexes of the agent's local state are going to be divided by communicationStorageInterval and rounded up. |
|
| maxDiscreteAgentPackets | double | 3 | ||
| maxDiscreteAwaitingPackets | double | 3 | ||
| costFunction | double | 1 | ||
| agentWeight | double | 1 | ||
| sensorWeight | double | 1 | ||
| throughputWeight | double | 1 |