I feel like if you give the AI enough freedom for its intelligence to be helpful, you’d have the same pitfalls as having the AI pick a goal you’d approve of. I also feel like it’s not clear exactly which decisions you’d oversee. What if the AI convinces you that it’s actions are fine, because you’d approve of its method of choosing them, and that it’s method is fine, because you’d approve of the individual action?
I feel like if you give the AI enough freedom for its intelligence to be helpful, you’d have the same pitfalls as having the AI pick a goal you’d approve of. I also feel like it’s not clear exactly which decisions you’d oversee. What if the AI convinces you that it’s actions are fine, because you’d approve of its method of choosing them, and that it’s method is fine, because you’d approve of the individual action?