In a field like alignment or embedded agency, it’s useful to keep a list of one or two dozen ideas which seem like they should fit neatly into a full theory, although it’s not yet clear how. When working on a theoretical framework, you regularly revisit each of those ideas, and think about how it fits in. Every once in a while, a piece will click, and another large chunk of the puzzle will come together.
Selection vs control is one of those ideas. It seems like it should fit neatly into a full theory, but it’s not yet clear what that will look like. I revisit the idea pretty regularly (maybe once every 3-4 months) to see how it fits with my current thinking. It has not yet had its time, but I expect it will (that’s why it’s on the list, after all).
Bearing in mind that the puzzle piece has not yet properly clicked, here are some current thoughts on how it might connect to other pieces:
Selection and control have different type signatures.
A selection process optimizes for the values of variables in some model, which may or may not correspond anything in the real world. Human values seem to be like this—see Human Values Are A Function Of Humans’ Latent Variables.
A control process, on the other hand, directly optimizes things in its environment. A thermostat, for instance, does not necessarily contain any model of the temperature a few minutes in the future; it just directly optimizes the value of the temperature a few minutes in the future.
The post basically says it, but it’s worth emphasizing: reinforcement learning is a control process, expected utility maximization is a selection process. The difference in type signatures between RL and EU maximization is the same as the difference in type signatures between selection and control.
Inner and outer optimizers can have different type signatures: an outer controller (e.g. RL) can learn an inner selector (e.g. utility maximizer), or an outer selector (e.g. a human) can build an inner controller (e.g. a thermostat), or they could match types with or without matching models/objectives. Which things even can match depends on the types involved—e.g. if one of the two is a controller, it may not have any world-model, so it’s hard to talk about variables in its world-model corresponding to variables in a selector’s world-model.
The Good Regulator Theorem roughly says that the space of optimal controllers always includes a selector (although it doesn’t rule out additional non-selectors in that space).
In a field like alignment or embedded agency, it’s useful to keep a list of one or two dozen ideas which seem like they should fit neatly into a full theory, although it’s not yet clear how. When working on a theoretical framework, you regularly revisit each of those ideas, and think about how it fits in. Every once in a while, a piece will click, and another large chunk of the puzzle will come together.
Selection vs control is one of those ideas. It seems like it should fit neatly into a full theory, but it’s not yet clear what that will look like. I revisit the idea pretty regularly (maybe once every 3-4 months) to see how it fits with my current thinking. It has not yet had its time, but I expect it will (that’s why it’s on the list, after all).
Bearing in mind that the puzzle piece has not yet properly clicked, here are some current thoughts on how it might connect to other pieces:
Selection and control have different type signatures.
A selection process optimizes for the values of variables in some model, which may or may not correspond anything in the real world. Human values seem to be like this—see Human Values Are A Function Of Humans’ Latent Variables.
A control process, on the other hand, directly optimizes things in its environment. A thermostat, for instance, does not necessarily contain any model of the temperature a few minutes in the future; it just directly optimizes the value of the temperature a few minutes in the future.
The post basically says it, but it’s worth emphasizing: reinforcement learning is a control process, expected utility maximization is a selection process. The difference in type signatures between RL and EU maximization is the same as the difference in type signatures between selection and control.
Inner and outer optimizers can have different type signatures: an outer controller (e.g. RL) can learn an inner selector (e.g. utility maximizer), or an outer selector (e.g. a human) can build an inner controller (e.g. a thermostat), or they could match types with or without matching models/objectives. Which things even can match depends on the types involved—e.g. if one of the two is a controller, it may not have any world-model, so it’s hard to talk about variables in its world-model corresponding to variables in a selector’s world-model.
The Good Regulator Theorem roughly says that the space of optimal controllers always includes a selector (although it doesn’t rule out additional non-selectors in that space).