Next to intrinsic optimisation daemons that arise through training internal to hardware, suggest adding extrinsic optimising “divergent ecosystems” that arise through deployment and gradual co-option of (phenotypic) functionality within the larger outside world.
AI Safety so far research has focussed more on internal code (particularly CS/ML researchers) computed deterministically (within known statespaces, as mathematicians like to represent). That is, rather than complex external feedback loops that are uncomputable – given Good Regulator Theorem limits and the inherent noise interference on signals propagating through the environment (as would be intuitive for some biologists and non-linear dynamics theorists).
So extrinsic optimisation is easier for researchers in our community to overlook. See this related paper by a physicist studying origins of life.
I think the extrinsic optimization you describe is what I’m pointing toward with the label “coordination failures,” which might properly be labeled “alignment failures arising uniquely through the interactions of multiple actors who, if deployed alone, would be considered aligned.”
Great overview! I find this helpful.
Next to intrinsic optimisation daemons that arise through training internal to hardware, suggest adding extrinsic optimising “divergent ecosystems” that arise through deployment and gradual co-option of (phenotypic) functionality within the larger outside world.
AI Safety so far research has focussed more on internal code (particularly CS/ML researchers) computed deterministically (within known statespaces, as mathematicians like to represent). That is, rather than complex external feedback loops that are uncomputable – given Good Regulator Theorem limits and the inherent noise interference on signals propagating through the environment (as would be intuitive for some biologists and non-linear dynamics theorists).
So extrinsic optimisation is easier for researchers in our community to overlook. See this related paper by a physicist studying origins of life.
Cheers, Remmelt! I’m glad it was useful.
I think the extrinsic optimization you describe is what I’m pointing toward with the label “coordination failures,” which might properly be labeled “alignment failures arising uniquely through the interactions of multiple actors who, if deployed alone, would be considered aligned.”