(I super-upvoted this, since asking stupid questions is a major flinch/ugh field)
Ok, my stupid question, asked in a blatantly stupid way, is:
where does the decision theory stuff fit in The Plan? I have gotten the notion that it’s important for Value-Preserving Self-Modification in a potential AI agent, but I’m confused because it all sounds too much like game theory—there all all these other-agents it deals with. If it’s not for VPSM, and it fact some exploration of how AI would deal with potential agents, why is this important at all? Let AI figure that out, it’s going to be smarter than us anyway.
If there is some Architecture document I should read to grok this, please point me there.
I have gotten the notion that it’s important for Value-Preserving Self-Modification in a potential AI agent, but I’m confused because it all sounds too much like game theory—there all all these other-agents it deals with
My impression is that, with self-modification and time, continuity of identity becomes a sticky issue. If I can become an entirely different person tomorrow, how I structure my life is not the weak game theory of “how do I bargain with another me?” but the strong game theory of “how do I bargain with someone else?”
I think Eliezer’s reply (point ‘(B)’) to this comment by Wei Dai provides some explanation, as to what the decision theory is doing here.
From the reply (concerning UDT):
I still think [an AI ought to be able to come up with these ideas by itself], BTW. We should devote some time and resources to thinking about how we are solving these problems (and coming up with questions in the first place). Finding that algorithm is perhaps more important than finding a reflectively consistent decision algorithm, if we don’t want an AI to be stuck with whatever mistakes we might make.
And yet you found a reflectively consistent decision algorithm long before you found a decision-system-algorithm-finding algorithm. That’s not coincidence. The latter problem is much harder. I suspect that even an informal understanding of parts of it would mean that you could find timeless decision theory as easily as falling backward off a tree—you just run the algorithm in your own head. So with vey high probability you are going to start seeing through the object-level problems before you see through the meta ones. Conversely I am EXTREMELY skeptical of people who claim they have an algorithm to solve meta problems but who still seem confused about object problems. Take metaethics, a solved problem: what are the odds that someone who still thought metaethics was a Deep Mystery could write an AI algorithm that could come up with a correct metaethics? I tried that, you know, and in retrospect it didn’t work.
The meta algorithms are important but by their very nature, knowing even a little about the meta-problem tends to make the object problem much less confusing, and you will progress on the object problem faster than on the meta problem. Again, that’s not saying the meta problem is important. It’s just saying that it’s really hard to end up in a state where meta has really truly run ahead of object, though it’s easy to get illusions of having done so.
Other agents are complicated regularities in the world (or a more general decision problem setting). Finding problems with understanding what’s going on when we try to optimize in other agents’ presence is a good heuristic for spotting gaps in our understanding of the idea of optimization.
I think the main reason is simple. It’s hard to create a transparent/reliable agent without decision theory. Also, since we’re talking about a super-power agent, you don’t want to mess this up. CDT and EDT are known to mess up, so it would be very helpful to find a “correct” decision theory. Though you may somehow be able to get around it by letting an AI self-improve, it would be nice to have one less thing to worry about, especially because how the AI improves is itself a decision.
(I super-upvoted this, since asking stupid questions is a major flinch/ugh field)
Ok, my stupid question, asked in a blatantly stupid way, is: where does the decision theory stuff fit in The Plan? I have gotten the notion that it’s important for Value-Preserving Self-Modification in a potential AI agent, but I’m confused because it all sounds too much like game theory—there all all these other-agents it deals with. If it’s not for VPSM, and it fact some exploration of how AI would deal with potential agents, why is this important at all? Let AI figure that out, it’s going to be smarter than us anyway.
If there is some Architecture document I should read to grok this, please point me there.
My impression is that, with self-modification and time, continuity of identity becomes a sticky issue. If I can become an entirely different person tomorrow, how I structure my life is not the weak game theory of “how do I bargain with another me?” but the strong game theory of “how do I bargain with someone else?”
I think Eliezer’s reply (point ‘(B)’) to this comment by Wei Dai provides some explanation, as to what the decision theory is doing here.
From the reply (concerning UDT):
Other agents are complicated regularities in the world (or a more general decision problem setting). Finding problems with understanding what’s going on when we try to optimize in other agents’ presence is a good heuristic for spotting gaps in our understanding of the idea of optimization.
I think the main reason is simple. It’s hard to create a transparent/reliable agent without decision theory. Also, since we’re talking about a super-power agent, you don’t want to mess this up. CDT and EDT are known to mess up, so it would be very helpful to find a “correct” decision theory. Though you may somehow be able to get around it by letting an AI self-improve, it would be nice to have one less thing to worry about, especially because how the AI improves is itself a decision.