Designating some things as off-limits is a frame for defining agent behavior that’s different from frames that emphasize goals. The point is to sufficiently deconfuse this perspective so that we can train AIs that don’t circumvent boundaries.
An optimization frame says that this always happens, instrumentally valuable things always happen, unless agent’s values are very particular about avoiding them. But an agent doesn’t have to be primarily an optimizer, it could instead be primarily a boundary-preserver, and only incidentally an optimizer.
Designating some things as off-limits is a frame for defining agent behavior that’s different from frames that emphasize goals. The point is to sufficiently deconfuse this perspective so that we can train AIs that don’t circumvent boundaries.
An optimization frame says that this always happens, instrumentally valuable things always happen, unless agent’s values are very particular about avoiding them. But an agent doesn’t have to be primarily an optimizer, it could instead be primarily a boundary-preserver, and only incidentally an optimizer.