I upvoted this but I also wish it made clearer distinctions. In particular, I think it misses that the following can all be true:
Goal-directedness is natural and expected in the AI agents we will eventually build.
Goal-directedness in learning-based agents takes the form of contextual decision-influences (shards) steering cognition and behavior.
Increasing coherence looks like the shards within an agent tending to “not step on their own/one another’s feet” with respect to their own targets.
Agents will tend to resolve explicit incoherences within themselves over time, at least over what they currently care very much about.
Even as they resolve these incoherences, agents will not need or want to become utility maximizers globally, as that would require them to self-modify in a way inconsistent with their existing preferences.
And I’m not sure I fully agree with it, particularly point 1. I mean “goal directedness as argmax over action [or higher level mappings thereof] to maximise the expected value of a simple unitary utility function” and goal directedness as “contextually activated heuristics downstream of historical reinforcement events” seem to make pretty different predictions about the future (especially in the extremes), that I’m not actually sure I want to call the things that humans do “goal directed”. It seems unhelpful to overload the term when referring to such different decision making procedures.
I mean I do in fact think one can define a spectrum of goal directedness, but I think the extreme end of that spectrum (argmax) is anti-natural, that the sophisticated systems we have in biology and ML both look more like “executing computations/cognition that historically correlated with higher performance on the objective function a system was selected for performance on” and that this is a very important distinction.
“Contextually activated heuristics” seem to be of an importantly different type than “immutable terminal goals” or a “simple unitary utility function”.
I think for the purposes of this post, that distinction is very important and I need to emphasise it more.
I upvoted this but I also wish it made clearer distinctions. In particular, I think it misses that the following can all be true:
Goal-directedness is natural and expected in the AI agents we will eventually build.
Goal-directedness in learning-based agents takes the form of contextual decision-influences (shards) steering cognition and behavior.
Increasing coherence looks like the shards within an agent tending to “not step on their own/one another’s feet” with respect to their own targets.
Agents will tend to resolve explicit incoherences within themselves over time, at least over what they currently care very much about.
Even as they resolve these incoherences, agents will not need or want to become utility maximizers globally, as that would require them to self-modify in a way inconsistent with their existing preferences.
Strongly upvoted this comment.
But I’m not up for reworking the post right now.
And I’m not sure I fully agree with it, particularly point 1. I mean “goal directedness as argmax over action [or higher level mappings thereof] to maximise the expected value of a simple unitary utility function” and goal directedness as “contextually activated heuristics downstream of historical reinforcement events” seem to make pretty different predictions about the future (especially in the extremes), that I’m not actually sure I want to call the things that humans do “goal directed”. It seems unhelpful to overload the term when referring to such different decision making procedures.
I mean I do in fact think one can define a spectrum of goal directedness, but I think the extreme end of that spectrum (argmax) is anti-natural, that the sophisticated systems we have in biology and ML both look more like “executing computations/cognition that historically correlated with higher performance on the objective function a system was selected for performance on” and that this is a very important distinction.
“Contextually activated heuristics” seem to be of an importantly different type than “immutable terminal goals” or a “simple unitary utility function”.
I think for the purposes of this post, that distinction is very important and I need to emphasise it more.