Regarding all your arguments that use Intent Verification, my tentative position is that IV can’t be relied on to filter actions (as we’re still discussing under this sub-thread).
Nothing, but spending energy changes resources available, just as making a paperclip uses energy. If I make a paperclip, and then destroy the paperclip, that doesn’t decrease (and in fact, increases) the impact. Perhaps there is a way of doing this with available energy, but it doesn’t really matter because IV catches this. I mean, it’s basically just very obvious offsetting.
If I understand your argument correctly, you argue that the resources/energy device B is spending while “undoing impact” decreases the value of utility functions in U which is an additional impact that it might not be able to undo. But why wouldn’t it be able to counter that by saving enough energy/resources that would otherwise be wasted by humanity until the end of the episode? (perhaps it’s what you meant with “available energy”?).
So you start building a device, but before it’s completely specified you’ve already programmed the full intended specification in the device? That doesn’t make sense.
I don’t claim I know how to do it myself :) But for the agent it might be as easy as cloning itself and setting some modified utility function in the new clone (done in a smart way so as to not cause too much impact in any time step).
You said the agent has to seize power over 100 steps, but it can also make a singleton that will “revert” impact, before it’s free? This point is rather moot, as we could also suppose it’s already powerful.
As I argued above, for the agent—creating the device might be as easy as invoking a modified version of itself. In any case, I’m not sure I understand what “already powerful” means. In all the places I wrote “seizing power” I believe I should have just wrote “some convergent instrumental goal”.
On the other hand, in this setting, it isn’t normal, and making a paper clip does not impede all of your optimal plans by one entire step. Therefore, a large penalty is applied.
Suppose in time step 4 the robot that creates paper-clips moves its arm 1 cm to the left. Does this impacts most utility functions in U significantly less than 1 time-step worth of utility? How about a Rumba robot that drives 1 cm forward? It depends on how you define U, but I don’t see how we can assume this issue prevents the agent from building the device (again, compare a single action while building the device to a single action while making “conventional” progress on uA: why should the former be more “wasteful” for most of U compared to the latter?).
Regarding all your arguments that use Intent Verification, my tentative position is that IV can’t be relied on to filter actions (as we’re still discussing under this sub-thread).
If I understand your argument correctly, you argue that the resources/energy device B is spending while “undoing impact” decreases the value of utility functions in U which is an additional impact that it might not be able to undo. But why wouldn’t it be able to counter that by saving enough energy/resources that would otherwise be wasted by humanity until the end of the episode? (perhaps it’s what you meant with “available energy”?).
I don’t claim I know how to do it myself :) But for the agent it might be as easy as cloning itself and setting some modified utility function in the new clone (done in a smart way so as to not cause too much impact in any time step).
As I argued above, for the agent—creating the device might be as easy as invoking a modified version of itself. In any case, I’m not sure I understand what “already powerful” means. In all the places I wrote “seizing power” I believe I should have just wrote “some convergent instrumental goal”.
Suppose in time step 4 the robot that creates paper-clips moves its arm 1 cm to the left. Does this impacts most utility functions in U significantly less than 1 time-step worth of utility? How about a Rumba robot that drives 1 cm forward? It depends on how you define U, but I don’t see how we can assume this issue prevents the agent from building the device (again, compare a single action while building the device to a single action while making “conventional” progress on uA: why should the former be more “wasteful” for most of U compared to the latter?).