I’d rather say that some paths to alignment are about lacking certain capabilities.
If ultimately you want an AI to definitely-have some set of capabilities and lack some other set, you can get there by any combination of addition and subtraction (in the context of current AI that’s like a growing blob of capabilities). But if some of the capabilities we want are pretty specialized (like ones that relate to precisely interpreting our preferences), it might be faster to add or accelerate them somehow rather than waiting for capabilities to grow past them and then pruning back everything-that’s-not-what-we-want.
Thanks, this was interesting.
I’d rather say that some paths to alignment are about lacking certain capabilities.
If ultimately you want an AI to definitely-have some set of capabilities and lack some other set, you can get there by any combination of addition and subtraction (in the context of current AI that’s like a growing blob of capabilities). But if some of the capabilities we want are pretty specialized (like ones that relate to precisely interpreting our preferences), it might be faster to add or accelerate them somehow rather than waiting for capabilities to grow past them and then pruning back everything-that’s-not-what-we-want.
I think this is a good point, thanks.