Thanks for the link! Now I understand Eliezer’s question a little better. Let me try to think out loud and see if I’m right...
The Gödel machine works like this: it outputs a stream of actions according to some not-very-good algorithm, and keeps looping through proofs on the side. As soon as it finds a proof that replacing its whole source code leads to higher utility (from the next turn onwards) than keeping it as is, it replaces its whole source code. There’s no talk about replacing just the proof verifier.
Any such improvement is “not greedy” by Schmidhuber’s “Global optimality theorem”, which notes that the utility of staying with the old source code for another turn already takes into account all other possible self-improvements that the machine could do in the future. So the very first self-improvement found by the machine will likely look like an immediate jump to optimality.
The question arises, if the first self-improvement is guaranteed to be so awesome, will the machine ever find it at all, or will the proof search loop forever?
In some simple scenarios the proof search will in fact terminate. For example, if the machine has the option of pressing a lever that maxes out the reward, it can easily prove that self-modifying to always press the lever has optimal utility compared to any other self-improvement it can come up with. Note that in such simple scenarios the machine doesn’t actually get to upgrade its proof verifier to a more advanced one, it just jumps to the best strategy.
There are also more complex scenarios. For example, sometimes you can improve your starting algorithm a lot but the ultimate optimal strategy is uncomputable, or maybe it’s computable but the proof of optimality is out of your axiom system’s reach. In such cases it’s not clear to me why the machine’s proof search should ever terminate, and I couldn’t find a convincing answer to that in Schmidhuber’s writings, but Löb’s theorem seems to be only one of the many possible obstacles here.
Eliezer’s question seems to be mostly about such complex scenarios. I think it’s more enlightening to ask directly whether the Gödel machine will execute even one self-improvement, without dragging in Löb’s theorem. This gives us a hard math question that I’m too lazy and/or stupid to answer :-(
I think it’s more enlightening to ask directly whether the Gödel machine will execute even one self-improvement, without dragging in Löb’s theorem. This gives us a hard math question that I’m too lazy and/or stupid to answer :-(
Well I don’t think I can solve this either, at least not quickly enough to justify the use of time. The combination of Loeb’s theorem and the requirement for absolute proof (it couldn’t even tentatively justify the use of, say, RSA without a proof that it is unbreakable) severely limits its abilities. Rereading the paper, I noticed that, since it explicitly compares the new design with the old one, it gets around a few Loebian difficulties, but this is clearly not a solution because we end up with a weird asymmetry between agent that the Goedel machine is willing to self-modify into and agents that the Godel machine is willing to replace itself with through any mechanism other that the self-improvement module.
On a completely different note, this post made me realize how horribly our knowledge is organized. This is an open problem that Eliezer believes is critical to the creation of FAI and in order to find information on it we have to read 7 year old sl4 posts and think about aspects of papers that have clearly been thought about before by very smart people, and yet we are still left with questions for which it is reasonably likely that someone knows the answer. You’re one of our decision theory people—you’ve been thinking about these things for years! If this was sanely organized, you would have heard about the ideas that someone already understand many times over the course of the decision theory discussions.
After thinking about the problem some more, I no longer understand why Eliezer wants the machine to be fully meta and self-modifying. Imagine the following design:
1) An object-level program O that outputs actions into the world. Initially it just sits still and does nothing.
2) A meta-level program M that contains a prior about the world and a utility function, and slowly enumerates all possible proofs in some formal system. When M finds a proof that some object-level program O’ achieves higher expected utility than O, it replaces O with O’ and goes on searching. When M finds a proof that the current O is optimal, it stops.
In this setup M can’t modify itself, but that doesn’t matter. If it were able to self-modify, it would spend perhaps a century on finding the initial self-modification, but now it can spend the same century on making O self-modifying instead, so it seems to me that we keep most of the hypothetical speedup. Another bonus is that the combination of O and M looks much easier to reason about than a fully meta machine, e.g. M will obviously find many easy speedups for O without running into Löb or anything. Is there any reason to want a fully meta machine over this?
(Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.)
Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.
Wouldn’t any solution to this problem enable M to reason about itself, forcing it to consider the effects of being modified by O?
Yeah. This family of questions is the most important one we don’t know how to answer. Maybe Eliezer and Marcello have solved it, but they’re keeping mum.
I don’t think so. For one thing, Eliezer keeps talking about how important it is to solve this. I don’t think his personal ethics would let him try to spread awareness of an open problem if he had actually solved it. Also, he seems to think that if the public knew the answer, it would reduce the chance of the creation of uFAI, so, unless the solution was very different than he expected—I’m thinking of it providing a huge clue toward AGI, but that sounds unlikely—he would have incentive to make it public.
Does Eliezer keep talking about the thing I called “the drawback of AIXI”? It seems to me that he keeps talking about the “AI reflection problem”, which is different. And yeah, solving any of these problems would make AGI easier without making FAI much easier.
No, I was referring to the AI reflection problem in the grandparent.
I don’t know if that would make AGI much easier. Even with a good reflective decision theory, you’d still need a more efficient framework for inference than an AIXI-style brute force algorithm. On the other hand, if you could do inference well, you might still make a working AI without solving reflection, but it would be harder to understand its goal system, making it less likely to be friendly. The lack of reflectivity could be an obstacle, but I think that it is more likely that, given a powerful inference algorithm and a lack of concern for the dangers of AI, it wouldn’t be that hard to make something dangerous.
Thanks for the link! Now I understand Eliezer’s question a little better. Let me try to think out loud and see if I’m right...
The Gödel machine works like this: it outputs a stream of actions according to some not-very-good algorithm, and keeps looping through proofs on the side. As soon as it finds a proof that replacing its whole source code leads to higher utility (from the next turn onwards) than keeping it as is, it replaces its whole source code. There’s no talk about replacing just the proof verifier.
Any such improvement is “not greedy” by Schmidhuber’s “Global optimality theorem”, which notes that the utility of staying with the old source code for another turn already takes into account all other possible self-improvements that the machine could do in the future. So the very first self-improvement found by the machine will likely look like an immediate jump to optimality.
The question arises, if the first self-improvement is guaranteed to be so awesome, will the machine ever find it at all, or will the proof search loop forever?
In some simple scenarios the proof search will in fact terminate. For example, if the machine has the option of pressing a lever that maxes out the reward, it can easily prove that self-modifying to always press the lever has optimal utility compared to any other self-improvement it can come up with. Note that in such simple scenarios the machine doesn’t actually get to upgrade its proof verifier to a more advanced one, it just jumps to the best strategy.
There are also more complex scenarios. For example, sometimes you can improve your starting algorithm a lot but the ultimate optimal strategy is uncomputable, or maybe it’s computable but the proof of optimality is out of your axiom system’s reach. In such cases it’s not clear to me why the machine’s proof search should ever terminate, and I couldn’t find a convincing answer to that in Schmidhuber’s writings, but Löb’s theorem seems to be only one of the many possible obstacles here.
Eliezer’s question seems to be mostly about such complex scenarios. I think it’s more enlightening to ask directly whether the Gödel machine will execute even one self-improvement, without dragging in Löb’s theorem. This gives us a hard math question that I’m too lazy and/or stupid to answer :-(
Well I don’t think I can solve this either, at least not quickly enough to justify the use of time. The combination of Loeb’s theorem and the requirement for absolute proof (it couldn’t even tentatively justify the use of, say, RSA without a proof that it is unbreakable) severely limits its abilities. Rereading the paper, I noticed that, since it explicitly compares the new design with the old one, it gets around a few Loebian difficulties, but this is clearly not a solution because we end up with a weird asymmetry between agent that the Goedel machine is willing to self-modify into and agents that the Godel machine is willing to replace itself with through any mechanism other that the self-improvement module.
On a completely different note, this post made me realize how horribly our knowledge is organized. This is an open problem that Eliezer believes is critical to the creation of FAI and in order to find information on it we have to read 7 year old sl4 posts and think about aspects of papers that have clearly been thought about before by very smart people, and yet we are still left with questions for which it is reasonably likely that someone knows the answer. You’re one of our decision theory people—you’ve been thinking about these things for years! If this was sanely organized, you would have heard about the ideas that someone already understand many times over the course of the decision theory discussions.
After thinking about the problem some more, I no longer understand why Eliezer wants the machine to be fully meta and self-modifying. Imagine the following design:
1) An object-level program O that outputs actions into the world. Initially it just sits still and does nothing.
2) A meta-level program M that contains a prior about the world and a utility function, and slowly enumerates all possible proofs in some formal system. When M finds a proof that some object-level program O’ achieves higher expected utility than O, it replaces O with O’ and goes on searching. When M finds a proof that the current O is optimal, it stops.
In this setup M can’t modify itself, but that doesn’t matter. If it were able to self-modify, it would spend perhaps a century on finding the initial self-modification, but now it can spend the same century on making O self-modifying instead, so it seems to me that we keep most of the hypothetical speedup. Another bonus is that the combination of O and M looks much easier to reason about than a fully meta machine, e.g. M will obviously find many easy speedups for O without running into Löb or anything. Is there any reason to want a fully meta machine over this?
(Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.)
Wouldn’t any solution to this problem enable M to reason about itself, forcing it to consider the effects of being modified by O?
Yeah. This family of questions is the most important one we don’t know how to answer. Maybe Eliezer and Marcello have solved it, but they’re keeping mum.
I don’t think so. For one thing, Eliezer keeps talking about how important it is to solve this. I don’t think his personal ethics would let him try to spread awareness of an open problem if he had actually solved it. Also, he seems to think that if the public knew the answer, it would reduce the chance of the creation of uFAI, so, unless the solution was very different than he expected—I’m thinking of it providing a huge clue toward AGI, but that sounds unlikely—he would have incentive to make it public.
Does Eliezer keep talking about the thing I called “the drawback of AIXI”? It seems to me that he keeps talking about the “AI reflection problem”, which is different. And yeah, solving any of these problems would make AGI easier without making FAI much easier.
No, I was referring to the AI reflection problem in the grandparent.
I don’t know if that would make AGI much easier. Even with a good reflective decision theory, you’d still need a more efficient framework for inference than an AIXI-style brute force algorithm. On the other hand, if you could do inference well, you might still make a working AI without solving reflection, but it would be harder to understand its goal system, making it less likely to be friendly. The lack of reflectivity could be an obstacle, but I think that it is more likely that, given a powerful inference algorithm and a lack of concern for the dangers of AI, it wouldn’t be that hard to make something dangerous.