Endpoints are easier to predict than intermediate trajectories. I’m talking about what will happen inside almost any sufficiently powerful AGI, by virtue of it being sufficiently powerful.
IMO this kind of argument is a prime example of word games which outwardly seem like they impart deep truths about alignment, but actually tell you ~nothing. Somehow we’re supposed to deduce meaningful constraints on inner cognition of the policy, via… appeals to “eventually someone will build this”? To “long-horizon tasks demand this particular conceptualization of inner cognition and motivation”?
“Endpoints are easier to predict than intermediate trajectories” seems like a locally valid and relevant point to bring up. Then there is a valid argument here that there are lots of reasons people want to build powerful AGI, and that the argument about the structure of the cognition here is intended to apply to an endpoint where those goals are achieved, which is a valid response (if not a knockdown argument) to the argument of the interlocutor that is reasoning from local observations and trends.
Maybe you were actually commenting on some earlier section, but I don’t see any word games in the section you quoted.
“Endpoints are easier to predict than intermediate trajectories” seems like a locally valid and relevant point to bring up.
I don’t think it’s true here. Why should it be true?
However, to clarify, I was calling the second quoted sentence a word game, not the first.
Then there is a valid argument here that there are lots of reasons people want to build powerful AGI
Agreed.
that the argument about the structure of the cognition here is intended to apply to an endpoint where those goals are achieved,
[People want an outcome with property X and so we will get such an outcome]
[One outcome with property X involves cognitive structures Y]
Does not entail
[We will get an outcome with property X and cognitive structures Y]
But this is basically the word game!
“Whenever I talk about ‘powerful’ agents, I choose to describe them as having inner cognitive properties Y (e.g. the long-term consequentialism required for scheming)”
which vibes its way into “The agents are assumed to be powerful, how can you deny they have property Y?”
and then finally “People want ‘powerful’ agents and so will create them, and then we will have to deal with agents with inner cognitive property Y”
It sounds obviously wrong when I spell it out like this, but it’s what is being snuck in by sentences like
I’m talking about what will happen inside almost any sufficiently powerful AGI, by virtue of it being sufficiently powerful.
For convenience, I quote the fuller context:
Doomimir: [starting to anger] Simplicia Optimistovna, if you weren’t from Earth, I’d say I don’t think you’re trying to understand. I never claimed that GPT-4 in particular is what you would call deceptively aligned. Endpoints are easier to predict than intermediate trajectories. I’m talking about what will happen inside almost any sufficiently powerful AGI, by virtue of it being sufficiently powerful.
IMO this kind of argument is a prime example of word games which outwardly seem like they impart deep truths about alignment, but actually tell you ~nothing. Somehow we’re supposed to deduce meaningful constraints on inner cognition of the policy, via… appeals to “eventually someone will build this”? To “long-horizon tasks demand this particular conceptualization of inner cognition and motivation”?
I don’t understand the point.
“Endpoints are easier to predict than intermediate trajectories” seems like a locally valid and relevant point to bring up. Then there is a valid argument here that there are lots of reasons people want to build powerful AGI, and that the argument about the structure of the cognition here is intended to apply to an endpoint where those goals are achieved, which is a valid response (if not a knockdown argument) to the argument of the interlocutor that is reasoning from local observations and trends.
Maybe you were actually commenting on some earlier section, but I don’t see any word games in the section you quoted.
I don’t think it’s true here. Why should it be true?
However, to clarify, I was calling the second quoted sentence a word game, not the first.
Agreed.
[People want an outcome with property X and so we will get such an outcome]
[One outcome with property X involves cognitive structures Y]
Does not entail
[We will get an outcome with property X and cognitive structures Y]
But this is basically the word game!
“Whenever I talk about ‘powerful’ agents, I choose to describe them as having inner cognitive properties Y (e.g. the long-term consequentialism required for scheming)”
which vibes its way into “The agents are assumed to be powerful, how can you deny they have property Y?”
and then finally “People want ‘powerful’ agents and so will create them, and then we will have to deal with agents with inner cognitive property Y”
It sounds obviously wrong when I spell it out like this, but it’s what is being snuck in by sentences like
For convenience, I quote the fuller context:
Do we know any other outcomes?