I’m not sure how value-loading would apply to that situation, since you’re implicitly assuming a non-steadfast goal system as the default case of a kludge AI. Wouldn’t boxing be more applicable?
Well, there are many ways it could turn out to be that making Kludge AI safe is the less-impossible option. The way I had in mind was that maybe goal stability and value-loading turn out to be surprisingly feasible with Kludge AI, and you really can just “bolt on” Friendliness. I suppose another way making Kludge AI safe could be the less-impossible option is if it turns out to be possible to keep superintelligences boxed indefinitely but also use them to keep non-boxed superintelligences from being boxed, or something. In which case boxing research would be more relevant.
I’m not sure how value-loading would apply to that situation, since you’re implicitly assuming a non-steadfast goal system as the default case of a kludge AI. Wouldn’t boxing be more applicable?
Well, there are many ways it could turn out to be that making Kludge AI safe is the less-impossible option. The way I had in mind was that maybe goal stability and value-loading turn out to be surprisingly feasible with Kludge AI, and you really can just “bolt on” Friendliness. I suppose another way making Kludge AI safe could be the less-impossible option is if it turns out to be possible to keep superintelligences boxed indefinitely but also use them to keep non-boxed superintelligences from being boxed, or something. In which case boxing research would be more relevant.