The research field Fundamental Controllability Limits has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):
Theoretical limits to controlling any AGI using any method of causation.
Threat models of AGI convergent dynamics that are impossible to control (by 1.).
Impossibility theorems, by contradiction of ‘long-term AGI safety’ with convergence result (2.)
~ ~ ~
Definitions and Distinctions
‘AGI convergent dynamic that is impossible to control’:
Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.
‘Control:’
In theory, the control of system A over system B means that A can influence system B to achieve A’s desired subset of state space [Source: https://arxiv.org/pdf/2109.00484.pdf].
In practice, to engineer control of AGI requires simulating or detecting any unsafe effects internally, and then preventing or correcting those effects externally.
‘Long term’:
In theory: into perpetuity.
In practice: over a thousand years.
‘AGI safety’:
Ambient conditions/contexts around planet Earth changed by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition).
‘AGI’:
That the notion of ‘artificial intelligence’ (AI) can be either “narrow” or “general”:
That the notion of ‘narrow AI’ specifically implies:
a single domain of sense and action.
no possibility for self base-code modification.
a single well-defined meta-algorithm.
that all aspects of its own self agency/intention are fully defined by its builders/developers/creators.
That the notion of ‘general AI’ specifically implies:
multiple domains of sense/action;
intrinsic non-reducible possibility for self-modification;
and that/therefore; that the meta-algorithm is effectively arbitrary; hence;
that it is inherently undecidable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators.
[Source: https://mflb.com/ai_alignment_1/si_safety_qanda_out.html#p3]