In particular, we want to have it not want to intervene via other agents (including itself).
I don’t think this is true as stated. CDT handles multi-step plans by modelling them as influence on itself; so LCDT mutilating its self-model to eliminate self-influence will not just eliminate self-modification, but also multi-step planning. But an agent that doesn’t plan over multiple steps seems fairly useless.
At least one would need its action space to be very rich; maybe then it could work.
I don’t think this is true as stated. CDT handles multi-step plans by modelling them as influence on itself; so LCDT mutilating its self-model to eliminate self-influence will not just eliminate self-modification, but also multi-step planning. But an agent that doesn’t plan over multiple steps seems fairly useless.
At least one would need its action space to be very rich; maybe then it could work.