So you think that humans do not have a built-in solution to the Löbstacle, and you must also think we are capable of building an FAI that does have a built-in solution to the Löbstacle. That means an intelligence without a solution to the Löbstacle can produce another intelligence that shares its values and does have a solution to the Löbstacle.
Yup.
But then why is it necessary for us to solve this problem? [...] Why can’t we instead built an FAI without solving this problem, and depend on the FAI to solve the problem while it’s designing the next generation FAI?
I have two main reasons in mind. First, if you are willing to grant that (a) this is a problem that would require humans years of serial research to solve and (b) that it looks much easier to build this into an AI designed from scratch rather than bolting it on to an existing AI design that was created without taking these considerations into account, but you still think that (c) it would be a good plan to have the first-generation FAI solve this problem when building the next-generation FAI, then it seems that you need to assume that the FAI will be much better at AGI design than its human designers before it executes its first self-rewrite, since the human team would by assumption still need years to solve the problem at that point and the plan wouldn’t be particularly helpful if the first-generation FAI would need a similar amount of time or longer. But it seems unlikely to me that we first need to build ultraintelligent machines a la I.J. Good, far surpassing humans, before we can get an intelligence explosion: it seems to me that most of the probability mass should be in the required level of AGI research ability being ⇐ the level of the human research team working on the AGI. I admit that one possible strategy could be to continue having humans improve the initial FAI until it is superintelligent and then ask it to write a successor from scratch, solving the Löbstacle in the process, but it doesn’t seem particularly likely that this is cheaper than solving the problem beforehand.
Second, if we followed this plan, when building the initial FAI we would be unable to use mathematical logic (or other tools sufficiently similar to be subject to the same issues) in a straight-forward way when having it reason about its potential successor. This cuts off a large part of design-space that I’d naturally be looking to. Yes, if we can do it then it’s possible in principle to get an FAI to do it, but mimicking human reasoning doesn’t seem likely to me to be the easiest way to build a safe AGI.
Have you been following the discussions under my most recent post?
I agree with you that relying on an FAI team to solve a large number of philosophical problems correctly seems dangerous, although I’m sympathetic to Eliezer’s criticism of your outside-view arguments—I essentially agree with your conclusions, but I think I use more inside-view reasoning to arrive at them (would need to think longer to tease this apart). I agree with Paul that something like CEV for philosophy in addition to values should probably part of an FAI design. I agree with you that progress in metaphilosophy would be very valuable, but I do not have any concrete leads to follow. But I think that having good solutions to some of these problems is not unlikely to be helpful for FAI design (and more helpful to FAI than uFAI) so I still think that some amount of work allocated to these philosophical problems looks like a good thing; and I also think that working on these problems does on average reduce the probability of making a bad mistake even if we manage to have the FAI do philosophy itself and have it checked by “coherent extrapolated philosophy”.
You quoted my earlier comment that I think that making object-level progress is important enough that it seems a net positive despite making AGI research more interesting, but I don’t really feel that your post or the discussion below that contains much in the way of arguments about that—could you elaborate on the connection?
Sorry for the long-delayed reply, Wei!
Yup.
I have two main reasons in mind. First, if you are willing to grant that (a) this is a problem that would require humans years of serial research to solve and (b) that it looks much easier to build this into an AI designed from scratch rather than bolting it on to an existing AI design that was created without taking these considerations into account, but you still think that (c) it would be a good plan to have the first-generation FAI solve this problem when building the next-generation FAI, then it seems that you need to assume that the FAI will be much better at AGI design than its human designers before it executes its first self-rewrite, since the human team would by assumption still need years to solve the problem at that point and the plan wouldn’t be particularly helpful if the first-generation FAI would need a similar amount of time or longer. But it seems unlikely to me that we first need to build ultraintelligent machines a la I.J. Good, far surpassing humans, before we can get an intelligence explosion: it seems to me that most of the probability mass should be in the required level of AGI research ability being ⇐ the level of the human research team working on the AGI. I admit that one possible strategy could be to continue having humans improve the initial FAI until it is superintelligent and then ask it to write a successor from scratch, solving the Löbstacle in the process, but it doesn’t seem particularly likely that this is cheaper than solving the problem beforehand.
Second, if we followed this plan, when building the initial FAI we would be unable to use mathematical logic (or other tools sufficiently similar to be subject to the same issues) in a straight-forward way when having it reason about its potential successor. This cuts off a large part of design-space that I’d naturally be looking to. Yes, if we can do it then it’s possible in principle to get an FAI to do it, but mimicking human reasoning doesn’t seem likely to me to be the easiest way to build a safe AGI.
I agree with you that relying on an FAI team to solve a large number of philosophical problems correctly seems dangerous, although I’m sympathetic to Eliezer’s criticism of your outside-view arguments—I essentially agree with your conclusions, but I think I use more inside-view reasoning to arrive at them (would need to think longer to tease this apart). I agree with Paul that something like CEV for philosophy in addition to values should probably part of an FAI design. I agree with you that progress in metaphilosophy would be very valuable, but I do not have any concrete leads to follow. But I think that having good solutions to some of these problems is not unlikely to be helpful for FAI design (and more helpful to FAI than uFAI) so I still think that some amount of work allocated to these philosophical problems looks like a good thing; and I also think that working on these problems does on average reduce the probability of making a bad mistake even if we manage to have the FAI do philosophy itself and have it checked by “coherent extrapolated philosophy”.
You quoted my earlier comment that I think that making object-level progress is important enough that it seems a net positive despite making AGI research more interesting, but I don’t really feel that your post or the discussion below that contains much in the way of arguments about that—could you elaborate on the connection?