Are you claiming that future powerful AIs won’t be well described as pursuing goals (aka being goal-directed)? This is the read I get from the the “dragon” analogy you mention, but this can’t possibly be right because AI agents are already obviously well described as pursuing goals (perhaps rather stupidly). TBC the goals that current AI agents end up pursuing are instructions in natural language, not something more exotic.
(As far I can tell the word “optimizer” in “goal-directed optimizer” is either meaningless or redundant, so I’m ignoring that.)
Perhaps you just mean that future powerful AIs won’t ever be well described as consistently (e.g. across contexts) and effectively pursuing specific goals which they weren’t specifically trained or instructed to pursue?
Or that goal-directed behavior won’t arise emergently prior to humans being totally obsoleted by our AI successors (and possibly not even after that)?
TBC, I agree that some version of “deconfusing goal-directed behavior” is pretty similar to “deconfusing chairs” or “deconfusing consciousness”[1] (you might gain value from doing it, but only because you’ve ended up in a pretty weird epistemic state)
(Separately, I was confused by the original footnote. Is Alex claiming that deconfusing goal-directedness is a thing that no one has tried to do? (Seems wrong so probably not?) Or that it’s strange to be worried when the argument for worry depends on something so fuzzy that you need to deconfuse it? I think the second one after reading your comment, but I’m still unsure. Not important to respond.)
Seems true in the extreme (if you have 0 idea what something is how can you reasonably be worried about it), but less strange the futher you get from that.
Are you claiming that future powerful AIs won’t be well described as pursuing goals (aka being goal-directed)? This is the read I get from the the “dragon” analogy you mention, but this can’t possibly be right because AI agents are already obviously well described as pursuing goals (perhaps rather stupidly). TBC the goals that current AI agents end up pursuing are instructions in natural language, not something more exotic.
(As far I can tell the word “optimizer” in “goal-directed optimizer” is either meaningless or redundant, so I’m ignoring that.)
Perhaps you just mean that future powerful AIs won’t ever be well described as consistently (e.g. across contexts) and effectively pursuing specific goals which they weren’t specifically trained or instructed to pursue?
Or that goal-directed behavior won’t arise emergently prior to humans being totally obsoleted by our AI successors (and possibly not even after that)?
TBC, I agree that some version of “deconfusing goal-directed behavior” is pretty similar to “deconfusing chairs” or “deconfusing consciousness”[1] (you might gain value from doing it, but only because you’ve ended up in a pretty weird epistemic state)
See also “the meta problem of consciousness”
What do you mean by “well described”?
By well described, I mean a central example of how people typically use the word.
E.g., matches most common characteristics in the cluster around the word “goal”.
In the same way as something can be well described as a chair if it has a chair like shape and people use it for sitting.
(Separately, I was confused by the original footnote. Is Alex claiming that deconfusing goal-directedness is a thing that no one has tried to do? (Seems wrong so probably not?) Or that it’s strange to be worried when the argument for worry depends on something so fuzzy that you need to deconfuse it? I think the second one after reading your comment, but I’m still unsure. Not important to respond.)
He means the second one.
Seems true in the extreme (if you have 0 idea what something is how can you reasonably be worried about it), but less strange the futher you get from that.