Koen.Holtman comments on Question 3: Control proposals for minimizing bad outcomes

Koen.Holtman 17 Feb 2022 16:43 UTC
3 points

An intriguing and neglected direction for control proposal research concerns endogenous control—i.e., self-control.

Agree. To frame this in paradigm-language: most of the discussion on this forum, both arguments about AI/AGI dangers and plans that consider possible solutions, uses paradigm A:

Paradigm A: We treat the AGI as a spherical econ with an unknown and opaque internal structure, which was set up to maximise a reward function/reward signal.

But there is also

Paradigm B: We treat the AGI as a computer program with an internal motivation and structure that we can control utterly, because we are writing it.

This second paradigm leads to AGI safety research like my Creating AGI Safety Interlocks or the work by Cohen et al here.

Most ‘mainstream’ ML researchers, and definitely most robotics researchers, are working under paradigm B. This explains some of the disconnect between this forum and mainstream theoretical and applied AI/AI safety research.