Interaction with the ElectricGridEnv

In the previous section it was shown how to set up an environment. This section is intended to show how to interact with the environment and, above all, how to extract, store the data and covers the following topics:

  • Apply actions,

  • Data hook (how to tap the simulated data stream),

  • AC grid example.

The interactive content related to the section described here can be found in the form of a notebook here.

Basic interaction with the environment

The dynamic behaviour of the environment is simulated using linear state-space models. It interacts step-wise with the agent/controller like shown in the figure below. Based on the input/action u at timestep k the state x is calculated.

Based on that action u_k and the internal state-space model, the system response is evolved for one timestep and the new states x_k+1 are calulated. The state-space model is defined depending on the electric components - for more information about the ordinary differential equations behind,... see The Nodecontructor - Theory.

In the following will be described how to use the described parameter dict to define a simple, small example env to learn how to interact with it. This environment consists of a single phase electrical power grid with 1 source and 1 load as shown in the figure below. For reasons of clarity, only phase a is shown:

To get an env consisting of that specific setting with the correct filter type and load, the parameter dict is defined in beforehand and handed over to the env. Instead of num_sorces and num_loads, now the parameter dict and the connection matrix CM is used which defines if there is a connection between two nodes (e.g., source <-> load) or not. If there is a connetion the corresponding matrix entry is not zero, if there is no connection the entry is 0. For more information about the CM matrix see The Nodecontructor - Application.

To create a usefull example we first calulate a load which fits in case of power rating to the power of the source. Therefore, the function ParallelLoadImpedance() provied by the ElectricGrid package is used which calulates the passive parameters for a load for specified apparant power.

using ElectricGrid

S_source = 200e3

S_load = 150e3
pf_load = 1
v_rms = 230
R_load, L_load, X, Z = ParallelLoadImpedance(S_load, pf_load, v_rms)

Then we use these values during definition of the env in the parameter dict.

CM = [0. 1.
    -1. 0.]

parameters = Dict{Any, Any}(
        "source" => Any[
                        Dict{Any, Any}("pwr" => S_source, "control_type" => "classic", "mode" => "Step", "fltr" => "LC"),
        "load"   => Any[
                        Dict{Any, Any}("impedance" => "R", "R" => R_load),
        "cable"   => Any[
                        Dict{Any, Any}("R" => 1e-3, "L" => 1e-4, "C" => 1e-4, "i_limit" => 1e4, "v_limit" => 1e4,),
        "grid" => Dict{Any, Any}("fs"=>1e4, "phase"=>3, "v_rms"=>230, "f_grid" => 50, "ramp_end"=>0.0)

env = ElectricGridEnv(CM = CM, parameters = parameters)

5-element Vector{String}:

As can be seen, the five states marked in the equivalent circuit diagram in the figure above can be found via the env.state_ids (_a marking the phase is not displayed in the figure). Analog, the action_ids can be found which consists of 3 entries, one per phase for the defined source.

3-element Vector{String}:

To interact with the env, the function env(action) can be called:

env([0.2, 0.2, 0.2])

Here, the source gets an action of 0.2 to all three phases. This interaction can be done in a loop while the state is logged during this process:

# run 3 steps
for _ in 1:3
    env([0.2, 0.2, 0.2])

env.state[1:5]  # print states after 3 steps
5-element Vector{Float64}:

The control_type of the source is chosen as classic in mode => Step. That means it is open loop and a step is given to the input resulting in an input voltage of["grid"]["v_rms"]. For more information about possible control modes, see ClassicalController_Notebook or documentation.

The MultiController

The ElectricGrid toolbox provides a more enhanced method to run an experiment with a specific number of steps and even more episodes. It is based in the run command provided by the ReinforcementLeaning.jl/run toolbox and, therefore, can be used to learn a control task or to simulate a fixed setup.

First we take a look onto simulating the behaviour of an env without learning. Therefore, the Simulate() function from the MultiController is used. This acts as a configuration tool to define the corresponding classic controllers and RL agents. Additionally, it links inputs to the env.state_ids and outputs to the env.action_ids.

In our example, we only apply classic controllers to learn how the tool works (learning with RL agents is investigated in RL_Single_Agent_DEMO.ipynb). The interaction of the different components (env, agent, classic controllers, wrapper,...) is shown in the diagram below.

The depicted multiagent ensures that the states and actions for every source are exchanged with the correct agent or classic controller. This is depending on the control_types and modes. Therefore, the SetupAgents(env) method returns a Controller of type MultiController.

Then, the experiment is executed for (default) one episode. The length of the episode is defined depending on the parameter env.maxsteps (default = 500), which alternatively can be defined via env.t_end and env.ts. Inside Simulate() the run command (RL.base) executes 500 steps applying the defined controllers in the env. Every step the control inputs are calculated based on the corresponding states and the env is evolved for ts. Then, the new state is given to the controller/agent again.

The 500 steps only gets executed if none of the defined voltage and current limits of the states are exceeded (which would be equivalent to a system crash). In that case, a flag in the env is set (env.done = true) which terminates the episode.

To investigate the env <-> agent interaction (without learning), the Simulate() method can be used.

Controller = SetupAgents(env);
hook = Simulate(Controller, env);

If not defined in beforehand and handed over to the Simulate() methode, it returns a default hook.

Data-Hook (how to tap the simulated data stream)

To collect the data - so how the states evolved over time by the chosen actions - hooks (ReinforcmentLearning.jl/data-hooks) are used which will be executed at different stages of the experiment. Via theses hooks the data to be collected is stored in a DataFrame labeled via the state and action IDs. This all happens using the DataHook provided by the ElectricGrid package.

This DataFrame can be seen in the output of the upper cell. It contains 500 rows - one for each simulated step. In the different colons the information about the simulation is stored, e.g., the number of the episode, time, states, actions... By default a DefaultDataHook is generated which collects all available information about the states of the env.

If we are interested how the state of the current through the filter inductor of source 1 (first state in the env source1_i_L1_a) evolves over time, we can grap the data like follows:

id = env.state_ids[1] # get first state_id
hook.df[:, id]

500-element Vector{Float64}:

If we only want to log specific states during the experiment, a DataHook can be defined in beforehand. It only has to be defined which states and actions should be logged during initialisation. Then the hook has to be handed over to the Simulate() (or Learn()) command.

In the following, we will collect only the state "source1_i_L1_a" and the first action "source1_u_a". Alternatively, for example the number of source can be used to collect all the data for a specific source (compare out-commented example code). A lot of other flags can be defined to collect more measurements (RMS, reward, powers,...). For more details we refer to the documentation and the other example notebooks.

hook = DataHook(collect_state_ids = ["source1_i_L1_a"],
                collect_action_ids = ["source1_u_a"]
                #collect_sources  = [1]  # alternative collect all data of source 1 

Simulate(Controller, env, hook=hook);


500 rows × 10 columns (omitted printing of 4 columns)

Int64Float32Float64Float64Array…Float64[0.575, 0.575, 0.575]0.0
210.00010.00.0[0.575, 0.575, 0.575]64.3879
310.000264.3879205.206[0.575, 0.575, 0.575]118.512
410.0003118.512163.059[0.575, 0.575, 0.575]159.662
510.0004159.662118.471[0.575, 0.575, 0.575]188.416
610.0005188.41679.1171[0.575, 0.575, 0.575]206.748
710.0006206.74847.433[0.575, 0.575, 0.575]216.964
810.0007216.96423.6258[0.575, 0.575, 0.575]221.293
910.0008221.2937.07961[0.575, 0.575, 0.575]221.713
1010.0009221.713-3.29035[0.575, 0.575, 0.575]219.837
1110.001219.837-8.84257[0.575, 0.575, 0.575]216.869
1210.0011216.869-10.9624[0.575, 0.575, 0.575]213.633
1310.0012213.633-10.866[0.575, 0.575, 0.575]210.628
1410.0013210.628-9.51317[0.575, 0.575, 0.575]208.116
1510.0014208.116-7.60108[0.575, 0.575, 0.575]206.185
1610.0015206.185-5.59584[0.575, 0.575, 0.575]204.82
1710.0016204.82-3.77656[0.575, 0.575, 0.575]203.942
1810.0017203.942-2.28148[0.575, 0.575, 0.575]203.448
1910.0018203.448-1.15116[0.575, 0.575, 0.575]203.234
2010.0019203.234-0.365185[0.575, 0.575, 0.575]203.206
2110.002203.2060.129335[0.575, 0.575, 0.575]203.288
2210.0021203.2880.397567[0.575, 0.575, 0.575]203.423
2310.0022203.4230.503928[0.575, 0.575, 0.575]203.573
2410.0023203.5730.504717[0.575, 0.575, 0.575]203.713

Investigating the DataFrame we can see, that even more states then planned are logged. The ElectricGrid framework stores all information available with regards to the state we have chosen. Here, additionally the voltage across the inductor and the state after the action is applied is stored as well. For more detailed information, see API-Doku -> DataHook.

After the experiment, the RenderHookResults() function provided by the ElectricGrid package can be used to show the results of the simulated episode. States and actions which are logged with the DataHook can be selected to be plotted:

RenderHookResults(hook = hook, 
                    episode = 1,
                    states_to_plot  = ["source1_i_L1_a"],  
                    actions_to_plot = ["source1_u_a"])

As can be seen, the state "source1_i_L1_a" (belonging to the left y-axis) and the action "source1_u_a" (right y-axis) are plotted over the 500 timesteps while one timestep takes ts = 1e-4 s resulting in an episode time of 0.05 s. Here, the states and actions plotted/collected are not normalised.

Above the control_type of the source is chosen as classic in mode => Step. The implements an open-loop circuit. As action a step is given to the inputs resulting in an input voltage of["grid"]["v_rms"]. For more information about possible control modes, see ClassicalController_Notebook or documentation.

AC grid example

To run an AC grid where the source creates a sinusoidal voltage with the frequency of["grid"]["f_grid"] just the control mode has to be changed from Step to Swing.

CM = [0. 1.
    -1. 0.]

parameters = Dict{Any, Any}(
        "source" => Any[
                        Dict{Any, Any}("pwr" => S_source, "control_type" => "classic", "mode" => "Swing", "fltr" => "LC", "i_limit" => 1e4, "v_limit" => 1e4,),
        "load"   => Any[
                        Dict{Any, Any}("impedance" => "R", "R" => R_load)
        "cable"   => Any[
                        Dict{Any, Any}("R" => 1e-3, "L" => 1e-4, "C" => 1e-4),
        "grid" => Dict{Any, Any}("fs"=>1e4, "phase"=>3, "v_rms"=>230, "f_grid" => 50, "ramp_end"=>0.0)

env = ElectricGridEnv(CM = CM, parameters = parameters)

states_to_plot = ["source1_v_C_filt_a", "source1_v_C_filt_b", "source1_v_C_filt_c"]
action_to_plot = ["source1_u_a"]

hook = DataHook(collect_state_ids = states_to_plot)

Controller = SetupAgents(env)
Simulate(Controller, env, hook=hook)

RenderHookResults(hook = hook,
                  states_to_plot  = states_to_plot)

Based on this introduction, more enhanced environments can be created consisting of more sources and loads connected to each other, e.g.,

  • controlled by classic grid controllers to share the power based on the droop concept –> See: Classic_Controllers_Notebooks,
  • train RL agents to fullfill current control for single sources –> See: RL_Single_Agent_DEMO.ipynb,
  • train RL agents to interact in a larger power grid with classically controlled sources –> See: RL_Classical_Controllers_Merge_DEMO.ipynb.