In such a scenario, we could speak of “FIA”—friendly intelligence augmentation. A basic idea of existing FAI discourse is that the true human utility function needs to be determined, and then the values that make an AI human-friendly would be extrapolated from that.
This certainly gets proposed a lot. But isn’t it lesswrongian consensus that this is backwards? That the only way to build a FAI is to build an AI that will extrapolate and adopt the humane utility function on its own? (since human values are too complicated for mere humans to state explicitly).
I endorse this idea, but have a minor nitpick:
This certainly gets proposed a lot. But isn’t it lesswrongian consensus that this is backwards? That the only way to build a FAI is to build an AI that will extrapolate and adopt the humane utility function on its own? (since human values are too complicated for mere humans to state explicitly).