The desires of an agent are defined by its preferences. “This is a paperclip maximizer which does not want to maximize paperclips” is a contradiction in terms.
I’m not sure “preference” is a powerful enough term to capture an agent’s true goals, however defined. Consider any of the standard preference reversals: a heavy cigarette smoker, for example, might prefer to buy and consume their next pack in a Near context, yet prefer to quit in a Far. The apparent contradiction follows quite naturally from time discounting, yet neither interpretation of the person’s preferences is obviously wrong.
I’m not sure “preference” is a powerful enough term to capture an agent’s true goals, however defined. Consider any of the standard preference reversals: a heavy cigarette smoker, for example, might prefer to buy and consume their next pack in a Near context, yet prefer to quit in a Far. The apparent contradiction follows quite naturally from time discounting, yet neither interpretation of the person’s preferences is obviously wrong.
I’ve seen it used as shorthand for “utility function”, saving 5 keystrokes. That was the intended use here. Point taken, alternate phrasings welcome.