Roko comments on “AI Alignment” is a Dangerously Overloaded Term

Roko 16 Dec 2023 3:47 UTC
1 point
2
Aimability doesn’t mean reduced info
- Gerald Monroe 16 Dec 2023 4:09 UTC
  4 points
  −2
  Parent
  
  This way the model has consistent behavior and is unable to betray.
  
  This is a standard swe technique for larger, more reliable systems. See stateless microservices or how ROS works.
  
  For AI, look at all the examples where irrelevant information changes model behavior, such as the “grandma used to read me windows license keys” exploit.
  
  I interpret “aimability” as doing what the user most likely meant and nothing else, and “aligned aimability” would mean the probability of this goal being achieved is high.