This specific problem could easily be fixed, but the problem of the goal not being what we think it is, remains.
See also Kaj’s example: https://www.lesswrong.com/posts/Ez4zZQKWgC6fE3h9G/almost-every-powerful-algorithm-would-be-manipulative#vhZ9uvMwiMCepp6jH
This specific problem could easily be fixed, but the problem of the goal not being what we think it is, remains.
See also Kaj’s example: https://www.lesswrong.com/posts/Ez4zZQKWgC6fE3h9G/almost-every-powerful-algorithm-would-be-manipulative#vhZ9uvMwiMCepp6jH