There isn’t in that case; however, from Daniel’s comment (which he was using to make a somewhat different point):
AUP thinks very differently about building a nuclear reactor and then adding safety features than it does about building the safety features and then the dangerous bits of the nuclear reactor
I find this reassuring. If we didn’t have this, we would admit plans which are only low impact if not interrupted.
I don’t see why that’s necessary, since we‘re still able to do both plans?
Looking at it from another angle, agents which avoid freely putting themselves (even temporarily) in instrumentally convergent positions seem safer with respect to unexpected failures, so it might also be desirable in this case even though it isn’t objectively impactful in the classical sense.
I’m just trying to figure out if things could be neater. Many low-impact plans accidentally share prefixes with high-impact plans, and it feels weird if many of our orders semi-randomly require tweaking N.
There isn’t in that case; however, from Daniel’s comment (which he was using to make a somewhat different point):
I find this reassuring. If we didn’t have this, we would admit plans which are only low impact if not interrupted.
Is it possible to draw a boundary between Daniel’s case and mine?
I don’t see why that’s necessary, since we‘re still able to do both plans?
Looking at it from another angle, agents which avoid freely putting themselves (even temporarily) in instrumentally convergent positions seem safer with respect to unexpected failures, so it might also be desirable in this case even though it isn’t objectively impactful in the classical sense.
I’m just trying to figure out if things could be neater. Many low-impact plans accidentally share prefixes with high-impact plans, and it feels weird if many of our orders semi-randomly require tweaking N.
That’s a good point, and I definitely welcome further thought along these lines. I’ll think about it more as well!