Vladimir_Nesov comments on [missing post]

Vladimir_Nesov 12 Aug 2023 19:00 UTC
5 points
3
Designating some things as off-limits is a frame for defining agent behavior that’s different from frames that emphasize goals. The point is to sufficiently deconfuse this perspective so that we can train AIs that don’t circumvent boundaries.

An optimization frame says that this always happens, instrumentally valuable things always happen, unless agent’s values are very particular about avoiding them. But an agent doesn’t have to be primarily an optimizer, it could instead be primarily a boundary-preserver, and only incidentally an optimizer.