While I agree with the practical idea, conveyed in this post, I think the language of the post is inconsistent, and filling it with alignment slang makes the post much less understandable. To demonstrate this, I’ll reformulate the post in the ontology of Active Inference.
Our doctor-to-be does not treat the plan primarily as a prediction about the world; they treat it as a way to make the world be.
In Active Inference, a “prediction” and a “way to make the world to be” point to the same thing. Besides, a terminological clarification: in Active Inference, a plan (also policy) is a sequence of actions performed by the agent, interleaved (in discrete-time formulation) with predicted future states: π={a1,s1,a2,s2,...}, not merely a prediction, which is a generative model over states alone (in Active Inference, actions are ontologically distinct from states): P(st+1|st).
Now, when looking for those robust bottlenecks, I’d probably need to come up with a plan. Multiple plans, in fact. The point of “robust bottlenecks” is that they’re bottlenecks to many plans, after all. But those plans would not be optimization targets. I don’t treat the plans as ways to make the world be. Rather, the plans are predictions about how things might go. My “mainline plan”, if I have one, is not the thing I’m optimizing to make happen; rather, it’s my modal expectation for how I expect things to go (conditional on my efforts).
This is exactly what is described by Active Inference. In a simplified, discrete-time formulation, an Active Inference agent endlessly performs the following loop:
Receive observations
Update one’s model of the world according to the observations (e. g., using Bayesian inference)
“Choose a plan” by sampling a softmax distribution, where the logits are negative expected free energies, associated with each possible plan. Doing this exactly is impossible, of course, because the space of possible plans is infinite, so actual agents heuristically approximate this step.
Perform the first action from the chosen plan, and go back to step 1.
A complete, formal mathematical description of this loop is given in section 4.4 of Chapter 4 of Active Inference (which you can freely download the chapter on the linked page). In an actual implementation of an Active Inference agent, Fountas et al. used Monte-Carlo tree search as the heuristic to use in step 3 of the above loop.
Note that in the above loop, the plan is chosen again (and the whole space of plans is re-constructed) on every iteration of the loop. This, again, is an abstract ideal: that would be horribly inefficient to discard plans on every step, rather than incrementally update them. Nevertheless, this principle is captured in the adage “plans are worthless, but planning is everything”. And it also seems to me that John Wentworth tries to convey the same idea in the passage quoted above.
However, considering the above, the title “Plans Are Predictions, Not Optimization Targets”, and the suggestion that “robust bottlenecks” should be “optimisation targets” don’t make sense. Plans, if successful, have some outcomes (or goals, also known as prior preferences in Active Inference, which are probability distributions over future world states, in which the probability density indicates the level of preference), and, similarly, “robust bottlenecks” also have outcomes (or themselves are outcomes: it’s not clear to me what terminological convention do you use in the post in this regard). The ultimate goal of the plan (becoming a doctor, ensuring humanity’s value potential is not destroyed in the course of creating a superhuman AI) is still a goal. What you essentially suggest is frontloading the plan with achieving highly universal subgoals (”seeking options” in Nassim Taleb’s lingo). And the current title of the article can be reformulated as “Plans are not optimisation targets, but the outcomes of the plans are optimisation targets”, which doesn’t capture the idea of the post.
“On the path to your goal, identify and try to first achieve subgoals that can be most universally helpful on various different paths towards your goal” would be more accurate, IMO.
“Optimisation target” and “optimise for X” are confusing phrases
(This section is not only about this post I’m commenting on; I haven’t decided yet whether to leave it as a comment or to turn it into a separate post.)
Despite considerable exposure to writing on AI alignment, I get confused every time I see the phrase “optimisation target”.
As I wrote in a comment to “Reward is not an optimisation target”, the word “optimisation” is ambiguous because unclear which “optimisation” it refers to:
Reward probably won’t be a deep RL agent’s primary optimization target
The longer I look at this statement (and its shorter version “Reward is not the optimization target”), the less I understand what it’s supposed to mean, considering that “optimisation” might refer to the agent’s training process as well as the “test” process (even if they overlap or coincide). It looks to me that your idea can be stated more concretely as “the more intelligent/capable RL agents (either model-based or model-free) become in the process of training using the currently conventional training algorithms, the less they will be susceptible to wireheading, rather than actively seek it”?
Separately, the word “target” invokes a sense of teleology/goal-directedness, which, though perfectly fine in the context of this post, where people’s “optimisation targets” are discussed, is confusing when the phrase “optimisation target” is applied to objects or processes which don’t have any agency according to the models most people have in their heads, e. g., a DNN training episode.
Similarly, I think the phrase “optimise for X”, such as “optimise for graduating”, uses alignment slang completely unnecessarily. This phrase is exactly synonymous to “have a goal of graduating” or “tries to graduate”, but the latter two don’t have the veil of secondary connotation, related to alignment, which people might suppose the phrase “optimise for graduating” has, while, in fact, it doesn’t.
My optimization targets are, instead, the robust bottlenecks.
When reality throws a brick through the plans, I want my optimization target to have still been a good target in hindsight. Thus robust bottlenecks: something which is still a bottleneck under lots of different assumptions is more likely to be a bottleneck even under assumptions we haven’t yet realized we should use. The more robust the bottleneck is, the more likely it will be robust to whatever surprise reality actually throws at us.
In practical advice and self-help, a subset of these “robust bottlenecks” are also simply called “basics”: meeting basic physical and psychological needs, keeping up one’s physical condition and energy, taking care of one’s mental health.
While I agree with the practical idea, conveyed in this post, I think the language of the post is inconsistent, and filling it with alignment slang makes the post much less understandable. To demonstrate this, I’ll reformulate the post in the ontology of Active Inference.
In Active Inference, a “prediction” and a “way to make the world to be” point to the same thing. Besides, a terminological clarification: in Active Inference, a plan (also policy) is a sequence of actions performed by the agent, interleaved (in discrete-time formulation) with predicted future states: π={a1,s1,a2,s2,...}, not merely a prediction, which is a generative model over states alone (in Active Inference, actions are ontologically distinct from states): P(st+1|st).
This is exactly what is described by Active Inference. In a simplified, discrete-time formulation, an Active Inference agent endlessly performs the following loop:
Receive observations
Update one’s model of the world according to the observations (e. g., using Bayesian inference)
“Choose a plan” by sampling a softmax distribution, where the logits are negative expected free energies, associated with each possible plan. Doing this exactly is impossible, of course, because the space of possible plans is infinite, so actual agents heuristically approximate this step.
Perform the first action from the chosen plan, and go back to step 1.
A complete, formal mathematical description of this loop is given in section 4.4 of Chapter 4 of Active Inference (which you can freely download the chapter on the linked page). In an actual implementation of an Active Inference agent, Fountas et al. used Monte-Carlo tree search as the heuristic to use in step 3 of the above loop.
Note that in the above loop, the plan is chosen again (and the whole space of plans is re-constructed) on every iteration of the loop. This, again, is an abstract ideal: that would be horribly inefficient to discard plans on every step, rather than incrementally update them. Nevertheless, this principle is captured in the adage “plans are worthless, but planning is everything”. And it also seems to me that John Wentworth tries to convey the same idea in the passage quoted above.
However, considering the above, the title “Plans Are Predictions, Not Optimization Targets”, and the suggestion that “robust bottlenecks” should be “optimisation targets” don’t make sense. Plans, if successful, have some outcomes (or goals, also known as prior preferences in Active Inference, which are probability distributions over future world states, in which the probability density indicates the level of preference), and, similarly, “robust bottlenecks” also have outcomes (or themselves are outcomes: it’s not clear to me what terminological convention do you use in the post in this regard). The ultimate goal of the plan (becoming a doctor, ensuring humanity’s value potential is not destroyed in the course of creating a superhuman AI) is still a goal. What you essentially suggest is frontloading the plan with achieving highly universal subgoals (”seeking options” in Nassim Taleb’s lingo). And the current title of the article can be reformulated as “Plans are not optimisation targets, but the outcomes of the plans are optimisation targets”, which doesn’t capture the idea of the post.
“On the path to your goal, identify and try to first achieve subgoals that can be most universally helpful on various different paths towards your goal” would be more accurate, IMO.
“Optimisation target” and “optimise for X” are confusing phrases
(This section is not only about this post I’m commenting on; I haven’t decided yet whether to leave it as a comment or to turn it into a separate post.)
Despite considerable exposure to writing on AI alignment, I get confused every time I see the phrase “optimisation target”.
As I wrote in a comment to “Reward is not an optimisation target”, the word “optimisation” is ambiguous because unclear which “optimisation” it refers to:
Separately, the word “target” invokes a sense of teleology/goal-directedness, which, though perfectly fine in the context of this post, where people’s “optimisation targets” are discussed, is confusing when the phrase “optimisation target” is applied to objects or processes which don’t have any agency according to the models most people have in their heads, e. g., a DNN training episode.
Similarly, I think the phrase “optimise for X”, such as “optimise for graduating”, uses alignment slang completely unnecessarily. This phrase is exactly synonymous to “have a goal of graduating” or “tries to graduate”, but the latter two don’t have the veil of secondary connotation, related to alignment, which people might suppose the phrase “optimise for graduating” has, while, in fact, it doesn’t.
In practical advice and self-help, a subset of these “robust bottlenecks” are also simply called “basics”: meeting basic physical and psychological needs, keeping up one’s physical condition and energy, taking care of one’s mental health.