I think Alexander Kruel has been writing about things like this for a while (but arguing in the opposite direction). Here’s an example.
I find his arguments unpersuasive so far, but steelmaning a little bit. I you could argue that giving an AI any goal at all would basically entail making it grok humans, and the jump from that to correctly holding human values would be short.
I think Alexander Kruel has been writing about things like this for a while (but arguing in the opposite direction). Here’s an example.
I find his arguments unpersuasive so far, but steelmaning a little bit. I you could argue that giving an AI any goal at all would basically entail making it grok humans, and the jump from that to correctly holding human values would be short.