I think I just disagree with the ‘no offsetting’ desideratum as currently stated. My intuition is that if you do something which radically changes the world, saying “oops, looks like I just radically changed the world, better put it back” (or doing something similar ex ante) is what we want. Examples:
An agent turns the world into diamond mines. It then notices that there are no humans left, and thinks “hmmm, this is pretty radically different, better put them back”.
A system wants to cool a data centre, but notices that its plan for doing so would double the nitrogen concentration in the atmosphere, and notices that it could at low cost build some machines that bring the nitrogen concentration back to normal levels without doing anything else that seems crazy.
I don’t think that the first one is what we want—wouldn’t we prefer it just not do that? My intuition is that impact isn’t endpoint-to-endpoint, but rather measured along the arc of the agent’s actions.
I think the second one is (more) reasonable, but that’s conditional on nothing going wrong with the machines. I think part of my crux on ex ante is indeed “the agent should be able to make low impact plans which might be high impact if we arbitrarily removed ‘components’ (like cooling the data centre)”, but also “the agent shouldn’t be able to make high impact plans seem low impact by some clever offset.” Perhaps it comes down to whether the measure allows “clever” offsetting, or whether all the ex ante things it can do really are low impact.
I don’t think that the first one is what we want—wouldn’t we prefer it just not do that?
Sure. But if it did turn the world into diamond mines, I’d prefer it make the world more like it used to be along the dimension of ‘living beings existing’.
My intuition is that impact isn’t endpoint-to-endpoint, but rather measured along the arc of the agent’s actions.
I agree with this intuition, but I think I want my notion of impact to be about how different the world is from normal worlds, which would push the AI to be conservative with respect to how many humans exist, how much nitrogen is in the atmosphere, etc.
Perhaps it comes down to whether the measure allows “clever” offsetting, or whether all the ex ante things it can do really are low impact.
Yeah, I think that some types of offsetting are fake and other types are sensible, and you want to distinguish between them.
Sure. But if it did turn the world into diamond mines, I’d prefer it make the world more like it used to be along the dimension of ‘living beings existing’.
But if we allow this large of a thing to be called “low impact”, then we’re basically allowing anything, with some kind of clean-up afterwards. I think we just want the agent to be way more confined than that.
I agree with this intuition, but I think I want my notion of impact to be about how different the world is from normal worlds, which would push the AI to be conservative with respect to how many humans exist, how much nitrogen is in the atmosphere, etc.
Hm. If you’re saying we should actually have it be programmed related to these variables (or variables like those), I disagree—but if that’s the case, maybe we can postpone that debate until my next post.
But if we allow this large of a thing to be called “low impact”, then we’re basically allowing anything, with some kind of clean-up aft.
Well, the clean-up afterward is pretty important and valuable! But I feel like you’re misunderstanding me—obviously I think that the initial ‘turning the Earth into diamond mines’ plan is pretty high-impact and shouldn’t be allowed absent detailed consultation with humans. I’m just saying that conditional on that plan being executed, the correct ‘low-impact’ thinking is in fact to implement the clean-up plan, and that therefore impact measures that discourage the clean-up plan are conceptually flawed.
If you’re saying we should actually have it be programmed related to these variables (or variables like those), I disagree.
I’m not sure about whether it should be programmed relative to intuitively natural-seeming variables (e.g. atmospheric nitrogen concentration and number of humans), but I think that as a result of its programming it should be conservative with respect to those variables.
I’m just saying that conditional on that plan being executed, the correct ‘low-impact’ thinking is in fact to implement the clean-up plan, and that therefore impact measures that discourage the clean-up plan are conceptually flawed.
I assert that the low impact choice should be basically invariant of when you’re instantiated, and that the low impact thing to do is to make a few diamond mines without much of a fuss. You shouldn’t need any clean-up, because there shouldn’t be a mess.
The way I view it, the purpose of designing low-impact desiderata is that it might give us an idea of how to create a safety measure that doesn’t include any value-laden concepts.
The issue with saying that the AI should offset certain variables, such as nitrogen concentrations, is that it seems like an arbitrary variable that needs to be offset. If you say, “Well, the AI should offset nitrogen, but not offset our neurons that now know about the AI’s existence” then you are introducing values into the discussion of low impact, which kind of defeats the purpose.
Of course, the AI *should* offset the nitrogen, but whether it ought to be part of a low-impact measure is a separate question.
I think I just disagree with the ‘no offsetting’ desideratum as currently stated. My intuition is that if you do something which radically changes the world, saying “oops, looks like I just radically changed the world, better put it back” (or doing something similar ex ante) is what we want. Examples:
An agent turns the world into diamond mines. It then notices that there are no humans left, and thinks “hmmm, this is pretty radically different, better put them back”.
A system wants to cool a data centre, but notices that its plan for doing so would double the nitrogen concentration in the atmosphere, and notices that it could at low cost build some machines that bring the nitrogen concentration back to normal levels without doing anything else that seems crazy.
I don’t think that the first one is what we want—wouldn’t we prefer it just not do that? My intuition is that impact isn’t endpoint-to-endpoint, but rather measured along the arc of the agent’s actions.
I think the second one is (more) reasonable, but that’s conditional on nothing going wrong with the machines. I think part of my crux on ex ante is indeed “the agent should be able to make low impact plans which might be high impact if we arbitrarily removed ‘components’ (like cooling the data centre)”, but also “the agent shouldn’t be able to make high impact plans seem low impact by some clever offset.” Perhaps it comes down to whether the measure allows “clever” offsetting, or whether all the ex ante things it can do really are low impact.
Sure. But if it did turn the world into diamond mines, I’d prefer it make the world more like it used to be along the dimension of ‘living beings existing’.
I agree with this intuition, but I think I want my notion of impact to be about how different the world is from normal worlds, which would push the AI to be conservative with respect to how many humans exist, how much nitrogen is in the atmosphere, etc.
Yeah, I think that some types of offsetting are fake and other types are sensible, and you want to distinguish between them.
But if we allow this large of a thing to be called “low impact”, then we’re basically allowing anything, with some kind of clean-up afterwards. I think we just want the agent to be way more confined than that.
Hm. If you’re saying we should actually have it be programmed related to these variables (or variables like those), I disagree—but if that’s the case, maybe we can postpone that debate until my next post.
Well, the clean-up afterward is pretty important and valuable! But I feel like you’re misunderstanding me—obviously I think that the initial ‘turning the Earth into diamond mines’ plan is pretty high-impact and shouldn’t be allowed absent detailed consultation with humans. I’m just saying that conditional on that plan being executed, the correct ‘low-impact’ thinking is in fact to implement the clean-up plan, and that therefore impact measures that discourage the clean-up plan are conceptually flawed.
I’m not sure about whether it should be programmed relative to intuitively natural-seeming variables (e.g. atmospheric nitrogen concentration and number of humans), but I think that as a result of its programming it should be conservative with respect to those variables.
I assert that the low impact choice should be basically invariant of when you’re instantiated, and that the low impact thing to do is to make a few diamond mines without much of a fuss. You shouldn’t need any clean-up, because there shouldn’t be a mess.
The way I view it, the purpose of designing low-impact desiderata is that it might give us an idea of how to create a safety measure that doesn’t include any value-laden concepts.
The issue with saying that the AI should offset certain variables, such as nitrogen concentrations, is that it seems like an arbitrary variable that needs to be offset. If you say, “Well, the AI should offset nitrogen, but not offset our neurons that now know about the AI’s existence” then you are introducing values into the discussion of low impact, which kind of defeats the purpose.
Of course, the AI *should* offset the nitrogen, but whether it ought to be part of a low-impact measure is a separate question.