Alexander Gietelink Oldenziel comments on Davidad’s Bold Plan for Alignment: An In-Depth Explanation

Alexander Gietelink Oldenziel 21 Apr 2023 16:29 UTC
11 points
4
My understanding is that the world model is more like a very coarse projection of the world than a simulation

It’s not the case that the AGI has to be fooled into thinking the simulation is real like in the Truman Show (I like name tho!).

Davidad only tries to achieve ‘safety’ - not alignment. Indeed the AI may be fully unaligned.

The proposal is different from simulation propoals like Jacob Cannell’s LOVE in a simbox where one tries to align the values of the AI.

In davidads proposal the actual AGI is physically boxed and cannot interact with the world except through proposing policies inside this worldmodel (which get formally checked in the second stage).

One way of thinking about is that davidads proposal is really an elaborate boxing protocol but there are multiple boxes here:

The physical Faraday cage that houses the hardware The interface constraint that constraints the AI to only output into the formal world model The formal cage that is achieved by verifying the behaviour through mathmagic.

Although the technical challenges seems daunting, especially on such short timelines this is not where I am most skeptical. The key problem, like all boxing proposals, is more of a governance and coordination problem.