I wonder if making kludge AI safe might be the less-impossible option here.
Yeah, that’s possible. But as I said here, I suspect that learning whether that’s true mostly comes from doing FAI research (and from watching closely as the rest of the world inevitably builds toward Kludge AI). Also: if making Kludge AI safe is the less-impossible option, then at least some FAI research probably works just as well for that scenario — especially the value-loading problem stuff. MIRI hasn’t focused on that lately but that’s a local anomaly: some of the next several open problems on Eliezer’s to-explain list fall under the value-loading problem.
I’m not sure how value-loading would apply to that situation, since you’re implicitly assuming a non-steadfast goal system as the default case of a kludge AI. Wouldn’t boxing be more applicable?
Well, there are many ways it could turn out to be that making Kludge AI safe is the less-impossible option. The way I had in mind was that maybe goal stability and value-loading turn out to be surprisingly feasible with Kludge AI, and you really can just “bolt on” Friendliness. I suppose another way making Kludge AI safe could be the less-impossible option is if it turns out to be possible to keep superintelligences boxed indefinitely but also use them to keep non-boxed superintelligences from being boxed, or something. In which case boxing research would be more relevant.
Yeah, that’s possible. But as I said here, I suspect that learning whether that’s true mostly comes from doing FAI research (and from watching closely as the rest of the world inevitably builds toward Kludge AI). Also: if making Kludge AI safe is the less-impossible option, then at least some FAI research probably works just as well for that scenario — especially the value-loading problem stuff. MIRI hasn’t focused on that lately but that’s a local anomaly: some of the next several open problems on Eliezer’s to-explain list fall under the value-loading problem.
I’m not sure how value-loading would apply to that situation, since you’re implicitly assuming a non-steadfast goal system as the default case of a kludge AI. Wouldn’t boxing be more applicable?
Well, there are many ways it could turn out to be that making Kludge AI safe is the less-impossible option. The way I had in mind was that maybe goal stability and value-loading turn out to be surprisingly feasible with Kludge AI, and you really can just “bolt on” Friendliness. I suppose another way making Kludge AI safe could be the less-impossible option is if it turns out to be possible to keep superintelligences boxed indefinitely but also use them to keep non-boxed superintelligences from being boxed, or something. In which case boxing research would be more relevant.