1) It feels like passing the buck, which is a known antipattern in thinking about AI.
2) With a “soft” self-improving entity, like a team of people and AIs, most invariants you can define will also be “soft” and prone to drift over many iterations.
That’s why I’d prefer a more object-level solution to alignment, if we can have it. But maybe we can’t have it.
1) It feels like passing the buck, which is a known antipattern in thinking about AI.
Not sure what you mean by this or by “more object-level solution to alignment”. Please explain more?
2) With a “soft” self-improving entity, like a team of people and AIs, most invariants you can define will also be “soft” and prone to drift over many iterations.
Yeah I agree with this part. I think defining an invariant that is both “good enough” and achievable/provable will be very hard or maybe just impossible.
Not sure what you mean by this or by “more object-level solution to alignment”. Please explain more?
The proposed setup can be seen as a self-improving AI, but a pretty opaque one. To explain why it makes a particular decision, we must appeal to anthropomorphism, like “our team of researchers wouldn’t do such a stupid thing”. That seems prone to wishful thinking. I would prefer to launch an AI for which at least some decisions have non-anthropomorphic explanations.
My two biggest objections to that kind of plan:
1) It feels like passing the buck, which is a known antipattern in thinking about AI.
2) With a “soft” self-improving entity, like a team of people and AIs, most invariants you can define will also be “soft” and prone to drift over many iterations.
That’s why I’d prefer a more object-level solution to alignment, if we can have it. But maybe we can’t have it.
Not sure what you mean by this or by “more object-level solution to alignment”. Please explain more?
Yeah I agree with this part. I think defining an invariant that is both “good enough” and achievable/provable will be very hard or maybe just impossible.
The proposed setup can be seen as a self-improving AI, but a pretty opaque one. To explain why it makes a particular decision, we must appeal to anthropomorphism, like “our team of researchers wouldn’t do such a stupid thing”. That seems prone to wishful thinking. I would prefer to launch an AI for which at least some decisions have non-anthropomorphic explanations.