Ofer comments on Towards a New Impact Measure

Ofer 22 Sep 2018 22:49 UTC
LW: 1 AF: 1
AF
Regarding all your arguments that use Intent Verification, my tentative position is that IV can’t be relied on to filter actions (as we’re still discussing under this sub-thread).
Nothing, but spending energy changes resources available, just as making a paperclip uses energy. If I make a paperclip, and then destroy the paperclip, that doesn’t decrease (and in fact, increases) the impact. Perhaps there is a way of doing this with available energy, but it doesn’t really matter because IV catches this. I mean, it’s basically just very obvious offsetting.
If I understand your argument correctly, you argue that the resources/energy device B is spending while “undoing impact” decreases the value of utility functions in U which is an additional impact that it might not be able to undo. But why wouldn’t it be able to counter that by saving enough energy/resources that would otherwise be wasted by humanity until the end of the episode? (perhaps it’s what you meant with “available energy”?).
So you start building a device, but before it’s completely specified you’ve already programmed the full intended specification in the device? That doesn’t make sense.
I don’t claim I know how to do it myself :) But for the agent it might be as easy as cloning itself and setting some modified utility function in the new clone (done in a smart way so as to not cause too much impact in any time step).
You said the agent has to seize power over 100 steps, but it can also make a singleton that will “revert” impact, before it’s free? This point is rather moot, as we could also suppose it’s already powerful.
As I argued above, for the agent—creating the device might be as easy as invoking a modified version of itself. In any case, I’m not sure I understand what “already powerful” means. In all the places I wrote “seizing power” I believe I should have just wrote “some convergent instrumental goal”.
On the other hand, in this setting, it isn’t normal, and making a paper clip does not impede all of your optimal plans by one entire step. Therefore, a large penalty is applied.
Suppose in time step 4 the robot that creates paper-clips moves its arm 1 cm to the left. Does this impacts most utility functions in U significantly less than 1 time-step worth of utility? How about a Rumba robot that drives 1 cm forward? It depends on how you define U, but I don’t see how we can assume this issue prevents the agent from building the device (again, compare a single action while building the device to a single action while making “conventional” progress on $u_{A}$ : why should the former be more “wasteful” for most of U compared to the latter?).