After thinking about the problem some more, I no longer understand why Eliezer wants the machine to be fully meta and self-modifying. Imagine the following design:
1) An object-level program O that outputs actions into the world. Initially it just sits still and does nothing.
2) A meta-level program M that contains a prior about the world and a utility function, and slowly enumerates all possible proofs in some formal system. When M finds a proof that some object-level program O’ achieves higher expected utility than O, it replaces O with O’ and goes on searching. When M finds a proof that the current O is optimal, it stops.
In this setup M can’t modify itself, but that doesn’t matter. If it were able to self-modify, it would spend perhaps a century on finding the initial self-modification, but now it can spend the same century on making O self-modifying instead, so it seems to me that we keep most of the hypothetical speedup. Another bonus is that the combination of O and M looks much easier to reason about than a fully meta machine, e.g. M will obviously find many easy speedups for O without running into Löb or anything. Is there any reason to want a fully meta machine over this?
(Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.)
Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.
Wouldn’t any solution to this problem enable M to reason about itself, forcing it to consider the effects of being modified by O?
Yeah. This family of questions is the most important one we don’t know how to answer. Maybe Eliezer and Marcello have solved it, but they’re keeping mum.
I don’t think so. For one thing, Eliezer keeps talking about how important it is to solve this. I don’t think his personal ethics would let him try to spread awareness of an open problem if he had actually solved it. Also, he seems to think that if the public knew the answer, it would reduce the chance of the creation of uFAI, so, unless the solution was very different than he expected—I’m thinking of it providing a huge clue toward AGI, but that sounds unlikely—he would have incentive to make it public.
Does Eliezer keep talking about the thing I called “the drawback of AIXI”? It seems to me that he keeps talking about the “AI reflection problem”, which is different. And yeah, solving any of these problems would make AGI easier without making FAI much easier.
No, I was referring to the AI reflection problem in the grandparent.
I don’t know if that would make AGI much easier. Even with a good reflective decision theory, you’d still need a more efficient framework for inference than an AIXI-style brute force algorithm. On the other hand, if you could do inference well, you might still make a working AI without solving reflection, but it would be harder to understand its goal system, making it less likely to be friendly. The lack of reflectivity could be an obstacle, but I think that it is more likely that, given a powerful inference algorithm and a lack of concern for the dangers of AI, it wouldn’t be that hard to make something dangerous.
After thinking about the problem some more, I no longer understand why Eliezer wants the machine to be fully meta and self-modifying. Imagine the following design:
1) An object-level program O that outputs actions into the world. Initially it just sits still and does nothing.
2) A meta-level program M that contains a prior about the world and a utility function, and slowly enumerates all possible proofs in some formal system. When M finds a proof that some object-level program O’ achieves higher expected utility than O, it replaces O with O’ and goes on searching. When M finds a proof that the current O is optimal, it stops.
In this setup M can’t modify itself, but that doesn’t matter. If it were able to self-modify, it would spend perhaps a century on finding the initial self-modification, but now it can spend the same century on making O self-modifying instead, so it seems to me that we keep most of the hypothetical speedup. Another bonus is that the combination of O and M looks much easier to reason about than a fully meta machine, e.g. M will obviously find many easy speedups for O without running into Löb or anything. Is there any reason to want a fully meta machine over this?
(Of course all such models share the drawback of AIXI: they think programs live outside the universe, so O can inadvertently destroy M’s or its own hardware with its mining claws. Still, these things are interesting to figure out.)
Wouldn’t any solution to this problem enable M to reason about itself, forcing it to consider the effects of being modified by O?
Yeah. This family of questions is the most important one we don’t know how to answer. Maybe Eliezer and Marcello have solved it, but they’re keeping mum.
I don’t think so. For one thing, Eliezer keeps talking about how important it is to solve this. I don’t think his personal ethics would let him try to spread awareness of an open problem if he had actually solved it. Also, he seems to think that if the public knew the answer, it would reduce the chance of the creation of uFAI, so, unless the solution was very different than he expected—I’m thinking of it providing a huge clue toward AGI, but that sounds unlikely—he would have incentive to make it public.
Does Eliezer keep talking about the thing I called “the drawback of AIXI”? It seems to me that he keeps talking about the “AI reflection problem”, which is different. And yeah, solving any of these problems would make AGI easier without making FAI much easier.
No, I was referring to the AI reflection problem in the grandparent.
I don’t know if that would make AGI much easier. Even with a good reflective decision theory, you’d still need a more efficient framework for inference than an AIXI-style brute force algorithm. On the other hand, if you could do inference well, you might still make a working AI without solving reflection, but it would be harder to understand its goal system, making it less likely to be friendly. The lack of reflectivity could be an obstacle, but I think that it is more likely that, given a powerful inference algorithm and a lack of concern for the dangers of AI, it wouldn’t be that hard to make something dangerous.