I’m definitely not defending Graziano, as mentioned.
I do not know any way in which the thermostat might control better by itself containing any model
Let’s say our thermostat had a giant supercomputer cluster and a 1-frame-per-minute camera inside it.
We use self-supervised learning to learn a mapping:
(temperature history (including now), heater setting history (including now), video history (including now)) ↦ (next temperature, next video frame).
This mapping is our generative model, right? And it could grow very sophisticated. Like it could learn that when cocktail glasses appear in the camera frame, then a party is going to start soon, and the people are going to heat up the room in the near future, so we should keep the room on the cooler side right now to compensate.
Then the thermometer can do MPC, i.e. run through lots of future probabilistic rollouts of the next hour with different possible heater settings, and find the rollout where the temperature is most steady—plus some randomness as discussed next:
because the controller is controlling the plant, the full dynamics of the plant alone cannot be observed
That’s just explore-versus-exploit, right? You definitely don’t want to always exactly follow the trajectory that is predicted to be optimal. You want to occasionally do other things (a.k.a. explore) to make sure your model is actually correct. I guess some kind of multi-armed bandit algorithm thing?
Let’s say our thermostat had a giant supercomputer cluster and a 1-frame-per-minute camera inside it.
This sounds like a product of Sirius Cybernetics Corporation. “It is very easy to be blinded to the essential uselessness of them by the sense of achievement you get from getting them to work at all.”
All you need is a bimetallic strip and a pair of contacts to sense “too high” and “too low”.
Like it could learn that when cocktail glasses appear in the camera frame, then a party is going to start soon, and the people are going to heat up the room in the near future, so we should keep the room on the cooler side right now to compensate.
In other words, control worse now, in order to...what?
In other words, control worse now, in order to...what?
Suppose the loss is mean-square deviation from the set point. Suppose there’s going to be a giant uncontrollable exogenous heat source soon (crowded party), and suppose there is no cooling system (the thermostat is hooked up to a heater but there is no AC).
Then we’re expecting a huge contribution to the loss function from an upcoming positive temperature deviation. And there’s nothing much the system can do about it once the party is going, other than obviously not turn on the heat and make it even worse.
But supposing the system knows this is going to happen, it can keep the room a bit too cool before the party starts. That also incurs a loss, of course. But the way mean-square-loss works is that we come out ahead on average.
Like, if the deviation is 0° now and then +10° midway through the party, that’s higher-loss than −2° now and +8° midway through the party, again assuming loss = mean-square-deviation. 0²+10² > 2²+8², right?
This sounds like a product of Sirius Cybernetics Corporation. “It is very easy to be blinded to the essential uselessness of them by the sense of achievement you get from getting them to work at all.”
All you need is a bimetallic strip and a pair of contacts to sense “too high” and “too low”.
Well jeez, I’m not proposing that we actually do this! I thought the “giant supercomputer cluster” was a dead giveaway.
If you want a realistic example, I do think the brain uses generative modeling / MPC as part of its homeostatic / allostatic control systems (and motor control and so on). I think there are good reasons that the brain does it that way, and that alternative model-free designs would not work as well (although they would work more than zero).
I’m definitely not defending Graziano, as mentioned.
Let’s say our thermostat had a giant supercomputer cluster and a 1-frame-per-minute camera inside it.
We use self-supervised learning to learn a mapping:
(temperature history (including now), heater setting history (including now), video history (including now)) ↦ (next temperature, next video frame).
This mapping is our generative model, right? And it could grow very sophisticated. Like it could learn that when cocktail glasses appear in the camera frame, then a party is going to start soon, and the people are going to heat up the room in the near future, so we should keep the room on the cooler side right now to compensate.
Then the thermometer can do MPC, i.e. run through lots of future probabilistic rollouts of the next hour with different possible heater settings, and find the rollout where the temperature is most steady—plus some randomness as discussed next:
That’s just explore-versus-exploit, right? You definitely don’t want to always exactly follow the trajectory that is predicted to be optimal. You want to occasionally do other things (a.k.a. explore) to make sure your model is actually correct. I guess some kind of multi-armed bandit algorithm thing?
This sounds like a product of Sirius Cybernetics Corporation. “It is very easy to be blinded to the essential uselessness of them by the sense of achievement you get from getting them to work at all.”
All you need is a bimetallic strip and a pair of contacts to sense “too high” and “too low”.
In other words, control worse now, in order to...what?
Suppose the loss is mean-square deviation from the set point. Suppose there’s going to be a giant uncontrollable exogenous heat source soon (crowded party), and suppose there is no cooling system (the thermostat is hooked up to a heater but there is no AC).
Then we’re expecting a huge contribution to the loss function from an upcoming positive temperature deviation. And there’s nothing much the system can do about it once the party is going, other than obviously not turn on the heat and make it even worse.
But supposing the system knows this is going to happen, it can keep the room a bit too cool before the party starts. That also incurs a loss, of course. But the way mean-square-loss works is that we come out ahead on average.
Like, if the deviation is 0° now and then +10° midway through the party, that’s higher-loss than −2° now and +8° midway through the party, again assuming loss = mean-square-deviation. 0²+10² > 2²+8², right?
Well jeez, I’m not proposing that we actually do this! I thought the “giant supercomputer cluster” was a dead giveaway.
If you want a realistic example, I do think the brain uses generative modeling / MPC as part of its homeostatic / allostatic control systems (and motor control and so on). I think there are good reasons that the brain does it that way, and that alternative model-free designs would not work as well (although they would work more than zero).