The main difficulty, if there is one, is in “getting the function to play the role of the AGI values,” not in getting the AGI to compute the particular function we want in the first place.
Right, that is the problem (and IDK of anyone discussing this who says otherwise).
Another position would be that it’s probably easy to influence a few bits of the AI’s utility function, but not others. For example, it’s conceivable that, by doing capabilities research in different ways, you could increase the probability that the AGI is highly ambitious—e.g. tries to take over the whole lightcone, tries to acausally bargain, etc., rather than being more satisficy. (IDK how to do that, but plausibly it’s qualitatively easier than alignment.) Then you could claim that it’s half a bit more likely that you’ve made an FAI, given that an FAI would probably be ambitious. In this case, it does matter that the utility function is complex.
Right, that is the problem (and IDK of anyone discussing this who says otherwise).
Another position would be that it’s probably easy to influence a few bits of the AI’s utility function, but not others. For example, it’s conceivable that, by doing capabilities research in different ways, you could increase the probability that the AGI is highly ambitious—e.g. tries to take over the whole lightcone, tries to acausally bargain, etc., rather than being more satisficy. (IDK how to do that, but plausibly it’s qualitatively easier than alignment.) Then you could claim that it’s half a bit more likely that you’ve made an FAI, given that an FAI would probably be ambitious. In this case, it does matter that the utility function is complex.