I think it’s more enlightening to ask directly whether the Gödel machine will execute even one self-improvement, without dragging in Löb’s theorem. This gives us a hard math question that I’m too lazy and/or stupid to answer :-(
Well I don’t think I can solve this either, at least not quickly enough to justify the use of time. The combination of Loeb’s theorem and the requirement for absolute proof (it couldn’t even tentatively justify the use of, say, RSA without a proof that it is unbreakable) severely limits its abilities. Rereading the paper, I noticed that, since it explicitly compares the new design with the old one, it gets around a few Loebian difficulties, but this is clearly not a solution because we end up with a weird asymmetry between agent that the Goedel machine is willing to self-modify into and agents that the Godel machine is willing to replace itself with through any mechanism other that the self-improvement module.
On a completely different note, this post made me realize how horribly our knowledge is organized. This is an open problem that Eliezer believes is critical to the creation of FAI and in order to find information on it we have to read 7 year old sl4 posts and think about aspects of papers that have clearly been thought about before by very smart people, and yet we are still left with questions for which it is reasonably likely that someone knows the answer. You’re one of our decision theory people—you’ve been thinking about these things for years! If this was sanely organized, you would have heard about the ideas that someone already understand many times over the course of the decision theory discussions.
After thinking about the problem some more, I no longer understand why Eliezer wants the machine to be fully meta and self-modifying. Imagine the following design:
1) An object-level program O that outputs actions into the world. Initially it just sits still and does nothing.
2) A meta-level program M that contains a prior about the world and a utility function, and slowly enumerates all possible proofs in some formal system. When M finds a proof that some object-level program O’ achieves higher expected utility than O, it replaces O with O’ and goes on searching. When M finds a proof that the current O is optimal, it stops.
In this setup M can’t modify itself, but that doesn’t matter. If it were able to self-modify, it would spend perhaps a century on finding the initial self-modification, but now it can spend the same century on making O self-modifying instead, so it seems to me that we keep most of the hypothetical speedup. Another bonus is that the combination of O and M looks much easier to reason about than a fully meta machine, e.g. M will obviously find many easy speedups for O without running into Löb or anything. Is there any reason to want a fully meta machine over this?
(Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.)
Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.
Wouldn’t any solution to this problem enable M to reason about itself, forcing it to consider the effects of being modified by O?
Yeah. This family of questions is the most important one we don’t know how to answer. Maybe Eliezer and Marcello have solved it, but they’re keeping mum.
I don’t think so. For one thing, Eliezer keeps talking about how important it is to solve this. I don’t think his personal ethics would let him try to spread awareness of an open problem if he had actually solved it. Also, he seems to think that if the public knew the answer, it would reduce the chance of the creation of uFAI, so, unless the solution was very different than he expected—I’m thinking of it providing a huge clue toward AGI, but that sounds unlikely—he would have incentive to make it public.
Does Eliezer keep talking about the thing I called “the drawback of AIXI”? It seems to me that he keeps talking about the “AI reflection problem”, which is different. And yeah, solving any of these problems would make AGI easier without making FAI much easier.
No, I was referring to the AI reflection problem in the grandparent.
I don’t know if that would make AGI much easier. Even with a good reflective decision theory, you’d still need a more efficient framework for inference than an AIXI-style brute force algorithm. On the other hand, if you could do inference well, you might still make a working AI without solving reflection, but it would be harder to understand its goal system, making it less likely to be friendly. The lack of reflectivity could be an obstacle, but I think that it is more likely that, given a powerful inference algorithm and a lack of concern for the dangers of AI, it wouldn’t be that hard to make something dangerous.
Well I don’t think I can solve this either, at least not quickly enough to justify the use of time. The combination of Loeb’s theorem and the requirement for absolute proof (it couldn’t even tentatively justify the use of, say, RSA without a proof that it is unbreakable) severely limits its abilities. Rereading the paper, I noticed that, since it explicitly compares the new design with the old one, it gets around a few Loebian difficulties, but this is clearly not a solution because we end up with a weird asymmetry between agent that the Goedel machine is willing to self-modify into and agents that the Godel machine is willing to replace itself with through any mechanism other that the self-improvement module.
On a completely different note, this post made me realize how horribly our knowledge is organized. This is an open problem that Eliezer believes is critical to the creation of FAI and in order to find information on it we have to read 7 year old sl4 posts and think about aspects of papers that have clearly been thought about before by very smart people, and yet we are still left with questions for which it is reasonably likely that someone knows the answer. You’re one of our decision theory people—you’ve been thinking about these things for years! If this was sanely organized, you would have heard about the ideas that someone already understand many times over the course of the decision theory discussions.
After thinking about the problem some more, I no longer understand why Eliezer wants the machine to be fully meta and self-modifying. Imagine the following design:
1) An object-level program O that outputs actions into the world. Initially it just sits still and does nothing.
2) A meta-level program M that contains a prior about the world and a utility function, and slowly enumerates all possible proofs in some formal system. When M finds a proof that some object-level program O’ achieves higher expected utility than O, it replaces O with O’ and goes on searching. When M finds a proof that the current O is optimal, it stops.
In this setup M can’t modify itself, but that doesn’t matter. If it were able to self-modify, it would spend perhaps a century on finding the initial self-modification, but now it can spend the same century on making O self-modifying instead, so it seems to me that we keep most of the hypothetical speedup. Another bonus is that the combination of O and M looks much easier to reason about than a fully meta machine, e.g. M will obviously find many easy speedups for O without running into Löb or anything. Is there any reason to want a fully meta machine over this?
(Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.)
Wouldn’t any solution to this problem enable M to reason about itself, forcing it to consider the effects of being modified by O?
Yeah. This family of questions is the most important one we don’t know how to answer. Maybe Eliezer and Marcello have solved it, but they’re keeping mum.
I don’t think so. For one thing, Eliezer keeps talking about how important it is to solve this. I don’t think his personal ethics would let him try to spread awareness of an open problem if he had actually solved it. Also, he seems to think that if the public knew the answer, it would reduce the chance of the creation of uFAI, so, unless the solution was very different than he expected—I’m thinking of it providing a huge clue toward AGI, but that sounds unlikely—he would have incentive to make it public.
Does Eliezer keep talking about the thing I called “the drawback of AIXI”? It seems to me that he keeps talking about the “AI reflection problem”, which is different. And yeah, solving any of these problems would make AGI easier without making FAI much easier.
No, I was referring to the AI reflection problem in the grandparent.
I don’t know if that would make AGI much easier. Even with a good reflective decision theory, you’d still need a more efficient framework for inference than an AIXI-style brute force algorithm. On the other hand, if you could do inference well, you might still make a working AI without solving reflection, but it would be harder to understand its goal system, making it less likely to be friendly. The lack of reflectivity could be an obstacle, but I think that it is more likely that, given a powerful inference algorithm and a lack of concern for the dangers of AI, it wouldn’t be that hard to make something dangerous.