Fundamental Controllability Limits

TagLast edit: Dec 28, 2024, 2:58 PM by Remmelt

The research field Fundamental Controllability Limits has the purpose of verifying (both the empirical soundness of premises and validity of formal reasoning of):

Theoretical limits to controlling any AGI using any method of causation.
Threat models of AGI convergent dynamics that are impossible to control (by 1.).
Impossibility theorems, by contradiction of ‘long-term AGI safety’ with convergence result (2.)

~ ~ ~

Definitions and Distinctions

‘AGI convergent dynamic that is impossible to control’:

Iterated interactions of AGI internals (with connected surroundings of environment) that converge on (unsafe) conditions, where the space of interactions falls outside even one theoretical limit of control.

‘Control:’

In theory, the control of system A over system B means that A can influence system B to achieve A’s desired subset of state space [Source: https://dl.acm.org/doi/10.1145/3603371].
In practice, to engineer control of AGI requires tracking (detecting, modelling, simulating, comparing against references) effects internally to then correct for those effects externally.

‘Long term’:

In theory: into perpetuity.
In practice: over a thousand years.

‘AGI safety’:

Ambient conditions/contexts around planet Earth changed by the operation of AGI fall within the environmental range that humans need to survive (a minimum-threshold definition of safety).

‘AGI’:

That the notion of ‘artificial intelligence’ (AI) can be either “narrow” or “general”:

That the notion of ‘narrow AI’ specifically implies:

a single domain of sense and action.
no possibility for self base-code modification.
a single well-defined meta-algorithm.
that all aspects of its own self agency/intention are fully defined by its builders/developers/creators.

That the notion of ‘general AI’ specifically implies:

multiple domains of sense/action;
intrinsic non-reducible possibility for self-modification;
and that/therefore; that the meta-algorithm is effectively arbitrary; hence;
that it is inherently undecidable as to whether all aspects of its own self agency/intention are fully defined by only its builders/developers/creators.

[Source: https://mflb.com/ai_alignment_1/si_safety_qanda_out.html#p3]

The Robot, the Puppet-master, and the Psychohistorian

WillPetilloDec 28, 2024, 12:12 AM

8 points

2 comments3 min readLW link

Projects I would like to see (possibly at AI Safety Camp)

Linda LinseforsSep 27, 2023, 9:27 PM

22 points

12 comments4 min readLW link

The Control Problem: Unsolved or Unsolvable?

RemmeltJun 2, 2023, 3:42 PM

55 points

46 comments14 min readLW link

Why Recursive Self-Improvement Might Not Be the Existential Risk We Fear

Nassim_ANov 24, 2024, 5:17 PM

1 point

0 comments9 min readLW link

No comments.

Fun­da­men­tal Con­trol­la­bil­ity Limits

The Robot, the Pup­pet-mas­ter, and the Psychohistorian

Pro­jects I would like to see (pos­si­bly at AI Safety Camp)

The Con­trol Prob­lem: Un­solved or Un­solv­able?

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Fundamental Controllability Limits

The Robot, the Puppet-master, and the Psychohistorian

Projects I would like to see (possibly at AI Safety Camp)

The Control Problem: Unsolved or Unsolvable?

Why Recursive Self-Improvement Might Not Be the Existential Risk We Fear