There is a training phase (1-6 years of weather observations) where the RL-agent trains using the building simulation program. Then I evaluate on 2022 weather data using the same building simulation program with the agent I previously trained. The graph contains real measured values from 2022 (blue line) of the building. The agent is evaluated on the weather data from year 2022.
Yes. The building has a certain inertia, this is something I hope the agent want to learn as well. The 36 hours outdoors temperature forecast is supplied in the observation state so that the agent knows it should preheat the building when forecast temperature is going down to lower the heating peak penalty.
I’d expect building inertia (and heater inertia, and water-flow and -temperature inertia) are important to both pre-heating effectively, and to smoothing out any spikes. The other factor in the spikes is probably the cost function—are you modeling constraints like minimum time to heat and rapid-cycling maintenance increase?
There is a training phase (1-6 years of weather observations) where the RL-agent trains using the building simulation program. Then I evaluate on 2022 weather data using the same building simulation program with the agent I previously trained. The graph contains real measured values from 2022 (blue line) of the building. The agent is evaluated on the weather data from year 2022.
Yes. The building has a certain inertia, this is something I hope the agent want to learn as well. The 36 hours outdoors temperature forecast is supplied in the observation state so that the agent knows it should preheat the building when forecast temperature is going down to lower the heating peak penalty.
I’d expect building inertia (and heater inertia, and water-flow and -temperature inertia) are important to both pre-heating effectively, and to smoothing out any spikes. The other factor in the spikes is probably the cost function—are you modeling constraints like minimum time to heat and rapid-cycling maintenance increase?