If there was no fact of the matter of what you want overall, there would be no fact of the matter of whether an AI is aligned with you or not. Which would mean there is no alignment problem.
The referenced post seems to apply specifically to IRL, which is purely based on behaviorism and doesn’t take information about the nature of the agent into account. (E.g. the fact that humans evolved from natural selection tells us a lot of what they probably want, and information about their brain could tell us how intelligent they are.) It’s also only an epistemic point about the problem of externally inferring values, not about those values not existing.
See my sequence “Reducing Goodhart” for what I (or me from a few years ago) think the impact is on the alignment problem.
the fact that humans evolved from natural selection tells us a lot of what they probably want,
Sure. But only if you already know what evolved creatures tend to want. I.e. once you have already made interpretive choices in one case, you can get some information on how well they hang together with other cases.
“It sure seems like there’s a fact of the matter” is not a very forceful argument to me, especially in light of things like it being impossible to uniquely fit a rationality model and utility function to human behavior.
If there was no fact of the matter of what you want overall, there would be no fact of the matter of whether an AI is aligned with you or not. Which would mean there is no alignment problem.
The referenced post seems to apply specifically to IRL, which is purely based on behaviorism and doesn’t take information about the nature of the agent into account. (E.g. the fact that humans evolved from natural selection tells us a lot of what they probably want, and information about their brain could tell us how intelligent they are.) It’s also only an epistemic point about the problem of externally inferring values, not about those values not existing.
See my sequence “Reducing Goodhart” for what I (or me from a few years ago) think the impact is on the alignment problem.
Sure. But only if you already know what evolved creatures tend to want. I.e. once you have already made interpretive choices in one case, you can get some information on how well they hang together with other cases.