Charlie Steiner comments on [missing post]

Charlie Steiner 12 Aug 2023 7:31 UTC
2 points
−1
If the AI wants (in the usual sense of the word) to achieve things for which bad aggression is instrumentally useful, and you (arguendo) completely prevent it from using bad aggression, you have barely slowed it down. It will still achieve the things it wants almost as fast, just using social manipulation or whatever instead.
The solution is, as ever, to make an AI that wants to achieve good things and not bad things.
- Vladimir_Nesov 12 Aug 2023 19:00 UTC
  5 points
  3
  Parent
  Designating some things as off-limits is a frame for defining agent behavior that’s different from frames that emphasize goals. The point is to sufficiently deconfuse this perspective so that we can train AIs that don’t circumvent boundaries.
  
  An optimization frame says that this always happens, instrumentally valuable things always happen, unless agent’s values are very particular about avoiding them. But an agent doesn’t have to be primarily an optimizer, it could instead be primarily a boundary-preserver, and only incidentally an optimizer.