The one thing that is worrisome is that if you ask that function to consult people’s function, to extract some form of extrapolation, then you don’t have the problem in the meta level, but still have it on the level of what the AGI thinks people are (say, because it scrutinized short-terms)
That’s a good point. One could imagine a method of getting utility functions from human values that, maybe due to improper specification, returned some parts from short-term desires and some other parts from long-term desires, maybe even inconsistently. Though that still wouldn’t result in the AI acting like a human—it would do weirder things.
That’s a good point. One could imagine a method of getting utility functions from human values that, maybe due to improper specification, returned some parts from short-term desires and some other parts from long-term desires, maybe even inconsistently. Though that still wouldn’t result in the AI acting like a human—it would do weirder things.