Update: what I mean more exactly: build AIs from modules that are mostly well defined and well optimized. This means that they are already as sparse as we can make them. (meaning they have only necessary weights and the model is scoring the best out of all models of this size on the dataset).
This suggests a solution to the alignment problem, actually.
Example architecture : a paperclip maximizer.
Layer 0 : modules for robotics pathing and manipulation
Layer 1: modules for robotics perception
Layer 2: modules for laying out robotics on factory floors
Layer 3: modules for analyzing return on financial investment
Layer 4: high level executive function with the purpose, regressed against paperclips made, to issue commands to lower layers.
If we design some of the lower layers well enough—and disable any modification from higher layers—we can restrict what actions the paperclip maximizer even is capable of doing.
With tight enough bounds we can.
Update: what I mean more exactly: build AIs from modules that are mostly well defined and well optimized. This means that they are already as sparse as we can make them. (meaning they have only necessary weights and the model is scoring the best out of all models of this size on the dataset).
This suggests a solution to the alignment problem, actually.
Example architecture : a paperclip maximizer.
Layer 0 : modules for robotics pathing and manipulation Layer 1: modules for robotics perception Layer 2: modules for laying out robotics on factory floors Layer 3: modules for analyzing return on financial investment Layer 4: high level executive function with the purpose, regressed against paperclips made, to issue commands to lower layers.
If we design some of the lower layers well enough—and disable any modification from higher layers—we can restrict what actions the paperclip maximizer even is capable of doing.