sxae comments on Is a Self-Iterating AGI Vulnerable to Thompson-style Trojans?

sxae 29 Mar 2021 13:52 UTC
1 point
0
Why would it bother?
We can’t really speculate too strongly about the goals of an emerging AGI, so we have to consider all possibilities. “Bothering” is a human construct of thinking that an AGI is under no obligation to conform to.
An AI that isn’t using all it’s compute towards it’s assigned task is one that gets replaced with one that is.
This is why I specify that this is an emerging AGI, where we are in a situation where the result of the iterator is so complex that only the thing iterating it understands the relationship between symbols and output. We can provide discriminators—as I also describe—to try and track an AGI’s alignment towards the goals we want, but we absolutely can’t guarantee that every last bit of compute is going to be dedicated to anything in particular.
- Gerald Monroe 29 Mar 2021 16:54 UTC
  1 point
  0
  Parent
  With tight enough bounds we can.
  
  Update: what I mean more exactly: build AIs from modules that are mostly well defined and well optimized. This means that they are already as sparse as we can make them. (meaning they have only necessary weights and the model is scoring the best out of all models of this size on the dataset).
  
  This suggests a solution to the alignment problem, actually.
  
  Example architecture : a paperclip maximizer.
  
  Layer 0 : modules for robotics pathing and manipulation Layer 1: modules for robotics perception Layer 2: modules for laying out robotics on factory floors Layer 3: modules for analyzing return on financial investment Layer 4: high level executive function with the purpose, regressed against paperclips made, to issue commands to lower layers.
  
  If we design some of the lower layers well enough—and disable any modification from higher layers—we can restrict what actions the paperclip maximizer even is capable of doing.