Seems odd to have the idealistic goal get to be the standard name, and the dime-a-dozen failure mode be a longer name that is more confusing.
I note that Wei says a similar thing happened to ‘act-based’:
My understanding is that “act-based agent” used to mean something different (i.e., a simpler kind of AI that tries to do the same kind of action that a human would), but most people nowadays use it to mean an AI that is designed to satisfy someone’s short-term preferences-on-reflection, even though that no longer seems particularly “act-based”.
Is there a reason why the standard terms are not being used to refer to the standard, short-term results?
(I suppose that economics assumes rational agents who know their preferences, so taking language from economics might lead to this situation with the ‘short-term preferences’ decision.)
In the post Wei contrasts “current” and “actual” preferences. “Stated” vs “reflective” preferences also seem like nice alternatives too.
Seems odd to have the idealistic goal get to be the standard name, and the dime-a-dozen failure mode be a longer name that is more confusing.
I agree this is confusing.
Is there a reason why the standard terms are not being used to refer to the standard, short-term results?
As far as I know, Paul hasn’t explained his choice in detail. One reason he does mention, in this comment, is that in the context of strategy-stealing, preferences like “help me stay in control and be well-informed” do not make sense when interpreted as preferences-as-elicited, since the current user has no way to know if they are in control or well-informed.
In the post Wei contrasts “current” and “actual” preferences. “Stated” vs “reflective” preferences also seem like nice alternatives too.
I think current=elicited=stated, but actual≈reflective (because there is the possibility that undergoing reflection isn’t a good way to find out our actual preferences, or as Paul says ‘There’s a hypothesis that “what I’d say after some particular idealized process of reflection” is a reasonable way to capture “actual preferences,” but I think that’s up for debate—e.g. it could fail if me-on-reflection is selfish and has values opposed to current-me, and certainly it could fail for any particular process of reflection and so it might just happen to be the case that there is no process of reflection that satisfies it.’)
As far as I know, Paul hasn’t explained his choice in detail. One reason he does mention, in this comment, is that in the context of strategy-stealing, preferences like “help me stay in control and be well-informed” do not make sense when interpreted as preferences-as-elicited, since the current user has no way to know if they are in control or well-informed.
I agree this example adds nuance, and I’m unsure how to correctly categorise it.
Seems odd to have the idealistic goal get to be the standard name, and the dime-a-dozen failure mode be a longer name that is more confusing.
I note that Wei says a similar thing happened to ‘act-based’:
Is there a reason why the standard terms are not being used to refer to the standard, short-term results?
(I suppose that economics assumes rational agents who know their preferences, so taking language from economics might lead to this situation with the ‘short-term preferences’ decision.)
In the post Wei contrasts “current” and “actual” preferences. “Stated” vs “reflective” preferences also seem like nice alternatives too.
I agree this is confusing.
As far as I know, Paul hasn’t explained his choice in detail. One reason he does mention, in this comment, is that in the context of strategy-stealing, preferences like “help me stay in control and be well-informed” do not make sense when interpreted as preferences-as-elicited, since the current user has no way to know if they are in control or well-informed.
I think current=elicited=stated, but actual≈reflective (because there is the possibility that undergoing reflection isn’t a good way to find out our actual preferences, or as Paul says ‘There’s a hypothesis that “what I’d say after some particular idealized process of reflection” is a reasonable way to capture “actual preferences,” but I think that’s up for debate—e.g. it could fail if me-on-reflection is selfish and has values opposed to current-me, and certainly it could fail for any particular process of reflection and so it might just happen to be the case that there is no process of reflection that satisfies it.’)
I agree this example adds nuance, and I’m unsure how to correctly categorise it.