Gerald Monroe comments on Is a Self-Iterating AGI Vulnerable to Thompson-style Trojans?

Gerald Monroe 29 Mar 2021 16:54 UTC
1 point
With tight enough bounds we can.

Update: what I mean more exactly: build AIs from modules that are mostly well defined and well optimized. This means that they are already as sparse as we can make them. (meaning they have only necessary weights and the model is scoring the best out of all models of this size on the dataset).

This suggests a solution to the alignment problem, actually.

Example architecture : a paperclip maximizer.

Layer 0 : modules for robotics pathing and manipulation Layer 1: modules for robotics perception Layer 2: modules for laying out robotics on factory floors Layer 3: modules for analyzing return on financial investment Layer 4: high level executive function with the purpose, regressed against paperclips made, to issue commands to lower layers.

If we design some of the lower layers well enough—and disable any modification from higher layers—we can restrict what actions the paperclip maximizer even is capable of doing.