I think the question of you/Adele miscommunicating is mostly under-specification of what features you want your test-AGI to have.
If you throttle its ability to optimize for its goals, see EY and Adele’s arguments.
If you don’t throttle in this way, you run into goal-specification/constraint-specification issues, instrumental convergence concerns and everything that goes along with it.
I think most people here will strongly feel a (computationally) powerful AGI with any incentives is scary, and that any test-versions should require using at-most a much-less-powerful one.
Sorry if I’ve misunderstood you at all.
If you specify the nature of/goals/constraints etc of your test-AI more specifically, maybe I or someone else can try to give you more specific failure-modes.
I think the question of you/Adele miscommunicating is mostly under-specification of what features you want your test-AGI to have.
If you throttle its ability to optimize for its goals, see EY and Adele’s arguments.
If you don’t throttle in this way, you run into goal-specification/constraint-specification issues, instrumental convergence concerns and everything that goes along with it.
I think most people here will strongly feel a (computationally) powerful AGI with any incentives is scary, and that any test-versions should require using at-most a much-less-powerful one.
Sorry if I’ve misunderstood you at all. If you specify the nature of/goals/constraints etc of your test-AI more specifically, maybe I or someone else can try to give you more specific failure-modes.