I don’t think anyone has proposed this. I think the most similar proposal is my instructions-following AGI (particularly since I’m also mostly thinking of just such a text prompt in a language model agent as the implementation).
My proposal with its checking emphasis is aimed more at the range where the AGI is human level and above, where yours seems more aimed at the truly super intelligent range. Mine keeps the human in charge of figuring out what they would’ve wanted in case the AGI gets that wrong.
Other related work is linked in that post.
The above objections to CEV partly apply to your proposal. There is probably not just one thing X would’ve wanted with more consideration, since conclusions may depend on circumstances.
I’m not sure that breaks the proposal; it could be that any of the several things X might’ve wanted would serve adequately.
I don’t think anyone has proposed this. I think the most similar proposal is my instructions-following AGI (particularly since I’m also mostly thinking of just such a text prompt in a language model agent as the implementation).
My proposal with its checking emphasis is aimed more at the range where the AGI is human level and above, where yours seems more aimed at the truly super intelligent range. Mine keeps the human in charge of figuring out what they would’ve wanted in case the AGI gets that wrong.
Other related work is linked in that post.
The above objections to CEV partly apply to your proposal. There is probably not just one thing X would’ve wanted with more consideration, since conclusions may depend on circumstances.
I’m not sure that breaks the proposal; it could be that any of the several things X might’ve wanted would serve adequately.