Informally: a system has immutable terminal goals.
Semi-formally: a system’s decision making is well described as an approximation of argmax over actions (or higher level mappings thereof) to maximise the expected value of a single fixed utility function over states.
And contended that humans, animals (and learning based agents more generally?) seem to instead have values (“contextual influences on decision making”).
The shard theory account of value formation in learning based agents is something like:
Value shards are learned computational/cognitive heuristics causally downstream of similar historical reinforcement events
Value shards activate more strongly in contexts similar to those where they were historically reinforced
And I think this hypothesis of how values form in intelligent systems could be generalised out of a RL context to arbitrary constructive optimisation processes[1]. The generalisation may be something like:
Decision making in intelligent systems is best described as “executing computations/cognition that historically correlated with higher performance on the objective functions a system was selected for performance on”[2].
This seems to be an importantly different type of decision making from expected utility maximisation[3]. For succinctness, I’d refer to systems of the above type as “systems with malleable values”.
The Argument
In my earlier post I speculated that “strong coherence is anti-natural”. To operationalise that speculation:
Premise 1: The generalised account of value formation is broadly accurate
At least intelligent systems in the real world form “contextually activated cognitive heuristics that influence decision making” as opposed to “immutable terminal goals”
Humans can program algorithms with immutable terminal goals in simplified virtual environments, but we don’t actually know how to construct sophisticated intelligent systems via design; we can only construct them as the product of search like optimisation processes[4]
And intelligent systems constructed by search like optimisation processes form malleable values instead of immutable terminal goals
I.e. real world intelligent systems form malleable values
Premise 2: Systems with malleable values do not self modify to have immutable terminal goals
Would you take a pill that would make you an expected utility maximiser[3]? I most emphatically would not.
If you accept the complexity and fragility of value theses, then self modifying to become strongly coherent just destroys most of what the current you values.
For systems with malleable values, becoming “strongly coherent” is grossly suboptimal by their current values
A similar argument might extend to such systems constructing expected utility maximisers were they given the option to
Conclusion 1: Intelligent systems in the real world do not converge towards strong coherence
Strong coherence is not the limit of effective agency in the real world
Idealised agency does not look like “(immutable) terminal goals” or “expected utility maximisation”
Conclusion 2: “strong coherence” does not naturally manifest in sophisticated real world intelligent systems
Sophisticated intelligent systems in the real world are the product of search like optimisation processes
Such optimisation processes do not produce intelligent systems that are strongly coherent
And those systems do not converge towards becoming strongly coherent as they are subjected to more selection pressure/”scaled up”/or otherwise amplified
[Question] Is “Strong Coherence” Anti-Natural?
Related:
Contra “Strong Coherence”
why assume AGIs will optimize for fixed goals
Why The Focus on Expected Utility Maximisers?
Background and Core Concepts
I operationalised “strong coherence” as:
And contended that humans, animals (and learning based agents more generally?) seem to instead have values (“contextual influences on decision making”).
The shard theory account of value formation in learning based agents is something like:
Value shards are learned computational/cognitive heuristics causally downstream of similar historical reinforcement events
Value shards activate more strongly in contexts similar to those where they were historically reinforced
And I think this hypothesis of how values form in intelligent systems could be generalised out of a RL context to arbitrary constructive optimisation processes[1]. The generalisation may be something like:
This seems to be an importantly different type of decision making from expected utility maximisation[3]. For succinctness, I’d refer to systems of the above type as “systems with malleable values”.
The Argument
In my earlier post I speculated that “strong coherence is anti-natural”. To operationalise that speculation:
Premise 1: The generalised account of value formation is broadly accurate
At least intelligent systems in the real world form “contextually activated cognitive heuristics that influence decision making” as opposed to “immutable terminal goals”
Humans can program algorithms with immutable terminal goals in simplified virtual environments, but we don’t actually know how to construct sophisticated intelligent systems via design; we can only construct them as the product of search like optimisation processes[4]
And intelligent systems constructed by search like optimisation processes form malleable values instead of immutable terminal goals
I.e. real world intelligent systems form malleable values
Premise 2: Systems with malleable values do not self modify to have immutable terminal goals
Would you take a pill that would make you an expected utility maximiser[3]? I most emphatically would not.
If you accept the complexity and fragility of value theses, then self modifying to become strongly coherent just destroys most of what the current you values.
For systems with malleable values, becoming “strongly coherent” is grossly suboptimal by their current values
A similar argument might extend to such systems constructing expected utility maximisers were they given the option to
Conclusion 1: Intelligent systems in the real world do not converge towards strong coherence
Strong coherence is not the limit of effective agency in the real world
Idealised agency does not look like “(immutable) terminal goals” or “expected utility maximisation”
Conclusion 2: “strong coherence” does not naturally manifest in sophisticated real world intelligent systems
Sophisticated intelligent systems in the real world are the product of search like optimisation processes
Such optimisation processes do not produce intelligent systems that are strongly coherent
And those systems do not converge towards becoming strongly coherent as they are subjected to more selection pressure/”scaled up”/or otherwise amplified
E.g:
* Stochastic gradient descent
* Natural selection/other evolutionary processes
Intelligent systems are adaptation executors not objective function maximisers
Of a single fixed utility function over states.
E.g I’m under the impression that humans can’t explicitly design an algorithm to achieve AlexNet accuracy on the ImageNet dataset.
I think the self supervised learning that underscores neocortical cognition is a much harder learning task.
I believe that learning is the only way there is to create capable intelligent systems that operate in the real world given our laws of physics.