I don’t see why that’s necessary, since we‘re still able to do both plans?
Looking at it from another angle, agents which avoid freely putting themselves (even temporarily) in instrumentally convergent positions seem safer with respect to unexpected failures, so it might also be desirable in this case even though it isn’t objectively impactful in the classical sense.
I’m just trying to figure out if things could be neater. Many low-impact plans accidentally share prefixes with high-impact plans, and it feels weird if many of our orders semi-randomly require tweaking N.
Is it possible to draw a boundary between Daniel’s case and mine?
I don’t see why that’s necessary, since we‘re still able to do both plans?
Looking at it from another angle, agents which avoid freely putting themselves (even temporarily) in instrumentally convergent positions seem safer with respect to unexpected failures, so it might also be desirable in this case even though it isn’t objectively impactful in the classical sense.
I’m just trying to figure out if things could be neater. Many low-impact plans accidentally share prefixes with high-impact plans, and it feels weird if many of our orders semi-randomly require tweaking N.
That’s a good point, and I definitely welcome further thought along these lines. I’ll think about it more as well!