But wouldn’t such an agent still be motivated to build an external optimizer of unbounded intelligence?
Yes, if it can. Suppose the unbounded intelligence is aligned with the original agent via CEV. The original agent has a pointer pointing to the unbounded intelligence. The unbounded intelligence has a pointer pointing to itself and (because of CEV) a pointer pointing to the original agent. There are now two cyclic references. We have lost our original direct self-reference, but it’s the cyclicness that is central to my post, not self-reference, specifically. Self-reference is just a particular example of the general exception.
Does that make sense? The above paragraph is kind of vague, expecting you to fill in the gaps. (I cheated too, by assuming CEV.) But I can phrase things more precisely and break them into smaller pieces, if you would prefer it that way.
It’s embedded in a world (edit: external optimizer is), so there is always some circularity, but I think that’s mostly about avoiding mindcrime and such? That doesn’t seem like a constraint on level of intelligence, so the orthogonality thesis should be content. CEV being complicated and its finer points being far in the logical future falls under goal complexity and doesn’t need to appeal to cyclic references.
The post says things about wireheading and world models and search functions, but it’s optimizers with unconstrained design we are talking about. So a proper frame seems to be decision theory, which is unclear for embedded agents, and a failing design is more of a thought experiment that motivates something about a better decision theory.
When you say “It’s”, are you referring to the original agent or to the unbounded intelligence it wants to create? I think you’re referring to the unbounded intelligence, but I want to be sure.
To clarify: I never intended to claim that the Orthogonality Thesis is violated due to a constraint on the level of intelligence. I claim that the Orthogonality Thesis is violated due to a constraint on viable values, after the intelligence of a world optimizer gets high enough.
Both are embedded in the world, but I meant the optimizer in that sentence. The original agent is even more nebulous than the unconstrained optimizer, since it might be operating under unknown constraints on design. (So it could well be cartesian, without self references. If we are introducing a separate optimizer, and only keeping idealized goals from the original agent, there is no more use for the original agent in the resulting story.)
In any case, a more general embedded decision theoretic optimizer should be defined from a position of awareness of the fact that it’s acting from within its world. What this should say about the optimizer itself is a question for decision theory that motivates its design.
Are you trying to advocate for decision theory? You write that this is “a question for decision theory”. But you also write that decision theory is “unclear for embedded agents”. And this whole conversation exclusively is about embedded agents. What parts are you advocating we use decision theory on and what parts are you advocating we don’t use decision theory on? I’m confused.
You write that this is “a question for decision theory”. But you also write that decision theory is “unclear for embedded agents”.
It’s a question of what decision theory for embedded agents should be, for which there is no clear answer. Without figuring that out, designing an optimizer is an even more murky endeavor, since we don’t have desiderata for it that make sense, which is what decision theory is about. So saying that decision theory for embedded agents is unclear is saying that designing embedded optimizers remains an ill-posed problem.
If clicking on the link doesn’t work, then that’s a bug with LW. I used the right link.
It is something of a bug with LW that results in giving you the wrong link to use (notice the #Wer2Fkueti2EvqmqN part of the link, which is the wrong part). The right link is this. It can be obtained by clicking “See in context” at the top of the page. (The threads remain uncombined, but at least they now have different topics.)
Oh! I think I understand your argument now. If I understand it correctly (and I might not) then your argument is an exception covered by this footnote. Creating an aligned superintelligence ends the need for maintaining a correct world model in the future for the same reason dying does: your future agentic impact after the pivotal act is negligible.
My argument is a vague objection to the overall paradigm of “let’s try to engineer an unconstrained optimizer”, I think it makes more sense to ask how decision theory for embedded agents should work, and then do what it recommends. The post doesn’t engage with that framing in a way I easily follow, so I don’t really understand it.
The footnote appears to refer to something about the world model component of the engineered optimizer you describe? But also to putting things into the goal, which shouldn’t be allowed? General consequentialist agents don’t respect boundaries of their own design and would eat any component of themselves such as a world model if that looks like a good idea. Which is one reason to talk about decision theories and not agent designs.
My post doesn’t engage with your framing at all. I think decision theory is the wrong tool entirely, because decision theory takes as a given the hardest part of the problem. I believe decision theory cannot solve this problem, and I’m working from a totally different paradigm.
Our disagreement is as wide as if you were a consequentialist and I was arguing from a Daoist perspective. (Actually, that might not be far from the truth. Some components of my post have Daoist influences.)
Don’t worry about trying to understand the footnote. Our disagreement appears to run much deeper than it.
because decision theory takes as a given the hardest part of the problem
What’s that?
My post doesn’t engage with your framing at all.
Sure, it was intended as a not-an-apology for not working harder to reframe implied desiderata behind the post in a way I prefer. I expect my true objection to remain the framing, but now I’m additionally confused about the “takes as a given” remark about decision theory, nothing comes to mind as a possibility.
It’s philosophical. I think it’d be best for us to terminate the conversation here. My objections against the over-use of decision theory are sophisticated enough (and distinct enough from what this post is about) that they deserve their own top-level post.
My short answer is that decision theory is based on Bayesian probability, and that Bayesian probability has holes related to a poorly-defined (in embedded material terms) concept of “belief”.
Thank you for the conversation, by the way. This kind of high-quality dialogue is what I love about LW.
Sure. I’d still like to note that I agree about Bayesian probability being a hack that should be avoided if at all possible, but I don’t see it as an important part (or any part at all) of framing agent design as a question of decision theory (essentially, of formulating desiderata for agent design before getting more serious about actually designing them).
For example, proof-based open source decision theory simplifies the problem to a ridiculous degree to more closely examine some essential difficulties of embedded agency (including self-reference), and it makes no use of probability, both in its modal logic variant and not. Updatelessness more generally tries to live without Bayesian updating.
Yes, if it can. Suppose the unbounded intelligence is aligned with the original agent via CEV. The original agent has a pointer pointing to the unbounded intelligence. The unbounded intelligence has a pointer pointing to itself and (because of CEV) a pointer pointing to the original agent. There are now two cyclic references. We have lost our original direct self-reference, but it’s the cyclicness that is central to my post, not self-reference, specifically. Self-reference is just a particular example of the general exception.
Does that make sense? The above paragraph is kind of vague, expecting you to fill in the gaps. (I cheated too, by assuming CEV.) But I can phrase things more precisely and break them into smaller pieces, if you would prefer it that way.
It’s embedded in a world (edit: external optimizer is), so there is always some circularity, but I think that’s mostly about avoiding mindcrime and such? That doesn’t seem like a constraint on level of intelligence, so the orthogonality thesis should be content. CEV being complicated and its finer points being far in the logical future falls under goal complexity and doesn’t need to appeal to cyclic references.
The post says things about wireheading and world models and search functions, but it’s optimizers with unconstrained design we are talking about. So a proper frame seems to be decision theory, which is unclear for embedded agents, and a failing design is more of a thought experiment that motivates something about a better decision theory.
When you say “It’s”, are you referring to the original agent or to the unbounded intelligence it wants to create? I think you’re referring to the unbounded intelligence, but I want to be sure.
To clarify: I never intended to claim that the Orthogonality Thesis is violated due to a constraint on the level of intelligence. I claim that the Orthogonality Thesis is violated due to a constraint on viable values, after the intelligence of a world optimizer gets high enough.
Both are embedded in the world, but I meant the optimizer in that sentence. The original agent is even more nebulous than the unconstrained optimizer, since it might be operating under unknown constraints on design. (So it could well be cartesian, without self references. If we are introducing a separate optimizer, and only keeping idealized goals from the original agent, there is no more use for the original agent in the resulting story.)
In any case, a more general embedded decision theoretic optimizer should be defined from a position of awareness of the fact that it’s acting from within its world. What this should say about the optimizer itself is a question for decision theory that motivates its design.
Are you trying to advocate for decision theory? You write that this is “a question for decision theory”. But you also write that decision theory is “unclear for embedded agents”. And this whole conversation exclusively is about embedded agents. What parts are you advocating we use decision theory on and what parts are you advocating we don’t use decision theory on? I’m confused.
It’s a question of what decision theory for embedded agents should be, for which there is no clear answer. Without figuring that out, designing an optimizer is an even more murky endeavor, since we don’t have desiderata for it that make sense, which is what decision theory is about. So saying that decision theory for embedded agents is unclear is saying that designing embedded optimizers remains an ill-posed problem.
I’m combining our two theads into one. Click here for continuation.
[Note: If clicking on the link doesn’t work, then that’s a bug with LW. I used the right link.][Edit: It was the wrong link.]
It is something of a bug with LW that results in giving you the wrong link to use (notice the #Wer2Fkueti2EvqmqN part of the link, which is the wrong part). The right link is this. It can be obtained by clicking “See in context” at the top of the page. (The threads remain uncombined, but at least they now have different topics.)
Fixed. Thank you.
Oh! I think I understand your argument now. If I understand it correctly (and I might not) then your argument is an exception covered by this footnote. Creating an aligned superintelligence ends the need for maintaining a correct world model in the future for the same reason dying does: your future agentic impact after the pivotal act is negligible.
My argument is a vague objection to the overall paradigm of “let’s try to engineer an unconstrained optimizer”, I think it makes more sense to ask how decision theory for embedded agents should work, and then do what it recommends. The post doesn’t engage with that framing in a way I easily follow, so I don’t really understand it.
The footnote appears to refer to something about the world model component of the engineered optimizer you describe? But also to putting things into the goal, which shouldn’t be allowed? General consequentialist agents don’t respect boundaries of their own design and would eat any component of themselves such as a world model if that looks like a good idea. Which is one reason to talk about decision theories and not agent designs.
My post doesn’t engage with your framing at all. I think decision theory is the wrong tool entirely, because decision theory takes as a given the hardest part of the problem. I believe decision theory cannot solve this problem, and I’m working from a totally different paradigm.
Our disagreement is as wide as if you were a consequentialist and I was arguing from a Daoist perspective. (Actually, that might not be far from the truth. Some components of my post have Daoist influences.)
Don’t worry about trying to understand the footnote. Our disagreement appears to run much deeper than it.
What’s that?
Sure, it was intended as a not-an-apology for not working harder to reframe implied desiderata behind the post in a way I prefer. I expect my true objection to remain the framing, but now I’m additionally confused about the “takes as a given” remark about decision theory, nothing comes to mind as a possibility.
It’s philosophical. I think it’d be best for us to terminate the conversation here. My objections against the over-use of decision theory are sophisticated enough (and distinct enough from what this post is about) that they deserve their own top-level post.
My short answer is that decision theory is based on Bayesian probability, and that Bayesian probability has holes related to a poorly-defined (in embedded material terms) concept of “belief”.
Thank you for the conversation, by the way. This kind of high-quality dialogue is what I love about LW.
Sure. I’d still like to note that I agree about Bayesian probability being a hack that should be avoided if at all possible, but I don’t see it as an important part (or any part at all) of framing agent design as a question of decision theory (essentially, of formulating desiderata for agent design before getting more serious about actually designing them).
For example, proof-based open source decision theory simplifies the problem to a ridiculous degree to more closely examine some essential difficulties of embedded agency (including self-reference), and it makes no use of probability, both in its modal logic variant and not. Updatelessness more generally tries to live without Bayesian updating.
Though there are always occasions to remember about probability, like the recent mystery about expected utility and updatelessness.