I’m a little confused though. I’m aware of Yudkowsky’s misgivings regarding the possible failings of prosaic AGI alignment, but I’m not sure where he states it to be border-line impossible or worse. Also, when you refer to MIRI being highly pessimistic of prosaic AGI alignment, are you referring to the organization as a whole, or a few key members?
I also don’t understand why this disparity of projections exists. Is there a more implicit part of the argument that neither party (Paul Christiano and MIRI) haven’t publicly adressed?
EDIT: Is the argument more so that it isn’t currently possible due to a lack of understanding regarding what corrigibility even is, without entertaining how possible it might be some years down the line?
I’m not sure where he states it to be border-line impossible or worse.
Here’s a recent comment, which doesn’t exactly say that but seems pretty close.
When you refer to MIRI being highly pessimistic of prosaic AGI alignment, are you referring to the organization as a whole, or a few key members?
I don’t know—people at MIRI don’t say much about their views; I’m generally responding to a stereotyped caricature of what people associate with MIRI because I don’t have any better model. (You can see some more discussion about this “MIRI viewpoint” here.) I’ve heard from other people that these viewpoints should be most associated with Nate, Eliezer and Benya, but I haven’t verified this myself.
I also don’t understand why this disparity of projections exists. Is there a more implicit part of the argument that neither party (Paul Christiano and MIRI) haven’t adressed?
I don’t know. To my knowledge the “doom” camp hasn’t really responded to the points raised, though here is a notable exception.
The most glaring argument that I could see raised against Christiano’s IDA is that it assumes a functioning AGI would already be developed before measures are taken to make it corrigible. At the same time though, that argument may very well be due to misunderstanding on my part. It’s also possible that MIRI would prefer that the field prioritize over seemingly preparing for non-FOOM scenarios. But I don’t understand how it couldn’t “possibly, possibly, possibly work”.
I’m a little confused though. I’m aware of Yudkowsky’s misgivings regarding the possible failings of prosaic AGI alignment, but I’m not sure where he states it to be border-line impossible or worse. Also, when you refer to MIRI being highly pessimistic of prosaic AGI alignment, are you referring to the organization as a whole, or a few key members?
I also don’t understand why this disparity of projections exists. Is there a more implicit part of the argument that neither party (Paul Christiano and MIRI) haven’t publicly adressed?
EDIT: Is the argument more so that it isn’t currently possible due to a lack of understanding regarding what corrigibility even is, without entertaining how possible it might be some years down the line?
Here’s a recent comment, which doesn’t exactly say that but seems pretty close.
I don’t know—people at MIRI don’t say much about their views; I’m generally responding to a stereotyped caricature of what people associate with MIRI because I don’t have any better model. (You can see some more discussion about this “MIRI viewpoint” here.) I’ve heard from other people that these viewpoints should be most associated with Nate, Eliezer and Benya, but I haven’t verified this myself.
I don’t know. To my knowledge the “doom” camp hasn’t really responded to the points raised, though here is a notable exception.
The most glaring argument that I could see raised against Christiano’s IDA is that it assumes a functioning AGI would already be developed before measures are taken to make it corrigible. At the same time though, that argument may very well be due to misunderstanding on my part. It’s also possible that MIRI would prefer that the field prioritize over seemingly preparing for non-FOOM scenarios. But I don’t understand how it couldn’t “possibly, possibly, possibly work”.