Your comment focuses on GPT4 being “pretty good at extracting preferences from human data” when the stronger part of the argument seems to be that “it will also generally follow your intended directions, rather than what you literally said”.
I agree with you that it was obvious in advance that a superintelligence would understand human value.
However, it sure sounded like you thought we’d have to specify each little detail of the value function. GPT4 seems to suggest that the biggest issue will be a situation where:
1) The AI has an option that would produce a lot of utility if you take one position on an exotic philosophical thought experiment and very little if you take the other side. 2) The existence of powerful AI means that the thought experiment is no longer exotic.
Your comment focuses on GPT4 being “pretty good at extracting preferences from human data” when the stronger part of the argument seems to be that “it will also generally follow your intended directions, rather than what you literally said”.
I agree with you that it was obvious in advance that a superintelligence would understand human value.
However, it sure sounded like you thought we’d have to specify each little detail of the value function. GPT4 seems to suggest that the biggest issue will be a situation where:
1) The AI has an option that would produce a lot of utility if you take one position on an exotic philosophical thought experiment and very little if you take the other side.
2) The existence of powerful AI means that the thought experiment is no longer exotic.