I don’t think that the first one is what we want—wouldn’t we prefer it just not do that? My intuition is that impact isn’t endpoint-to-endpoint, but rather measured along the arc of the agent’s actions.
I think the second one is (more) reasonable, but that’s conditional on nothing going wrong with the machines. I think part of my crux on ex ante is indeed “the agent should be able to make low impact plans which might be high impact if we arbitrarily removed ‘components’ (like cooling the data centre)”, but also “the agent shouldn’t be able to make high impact plans seem low impact by some clever offset.” Perhaps it comes down to whether the measure allows “clever” offsetting, or whether all the ex ante things it can do really are low impact.
I don’t think that the first one is what we want—wouldn’t we prefer it just not do that?
Sure. But if it did turn the world into diamond mines, I’d prefer it make the world more like it used to be along the dimension of ‘living beings existing’.
My intuition is that impact isn’t endpoint-to-endpoint, but rather measured along the arc of the agent’s actions.
I agree with this intuition, but I think I want my notion of impact to be about how different the world is from normal worlds, which would push the AI to be conservative with respect to how many humans exist, how much nitrogen is in the atmosphere, etc.
Perhaps it comes down to whether the measure allows “clever” offsetting, or whether all the ex ante things it can do really are low impact.
Yeah, I think that some types of offsetting are fake and other types are sensible, and you want to distinguish between them.
Sure. But if it did turn the world into diamond mines, I’d prefer it make the world more like it used to be along the dimension of ‘living beings existing’.
But if we allow this large of a thing to be called “low impact”, then we’re basically allowing anything, with some kind of clean-up afterwards. I think we just want the agent to be way more confined than that.
I agree with this intuition, but I think I want my notion of impact to be about how different the world is from normal worlds, which would push the AI to be conservative with respect to how many humans exist, how much nitrogen is in the atmosphere, etc.
Hm. If you’re saying we should actually have it be programmed related to these variables (or variables like those), I disagree—but if that’s the case, maybe we can postpone that debate until my next post.
But if we allow this large of a thing to be called “low impact”, then we’re basically allowing anything, with some kind of clean-up aft.
Well, the clean-up afterward is pretty important and valuable! But I feel like you’re misunderstanding me—obviously I think that the initial ‘turning the Earth into diamond mines’ plan is pretty high-impact and shouldn’t be allowed absent detailed consultation with humans. I’m just saying that conditional on that plan being executed, the correct ‘low-impact’ thinking is in fact to implement the clean-up plan, and that therefore impact measures that discourage the clean-up plan are conceptually flawed.
If you’re saying we should actually have it be programmed related to these variables (or variables like those), I disagree.
I’m not sure about whether it should be programmed relative to intuitively natural-seeming variables (e.g. atmospheric nitrogen concentration and number of humans), but I think that as a result of its programming it should be conservative with respect to those variables.
I’m just saying that conditional on that plan being executed, the correct ‘low-impact’ thinking is in fact to implement the clean-up plan, and that therefore impact measures that discourage the clean-up plan are conceptually flawed.
I assert that the low impact choice should be basically invariant of when you’re instantiated, and that the low impact thing to do is to make a few diamond mines without much of a fuss. You shouldn’t need any clean-up, because there shouldn’t be a mess.
The way I view it, the purpose of designing low-impact desiderata is that it might give us an idea of how to create a safety measure that doesn’t include any value-laden concepts.
The issue with saying that the AI should offset certain variables, such as nitrogen concentrations, is that it seems like an arbitrary variable that needs to be offset. If you say, “Well, the AI should offset nitrogen, but not offset our neurons that now know about the AI’s existence” then you are introducing values into the discussion of low impact, which kind of defeats the purpose.
Of course, the AI *should* offset the nitrogen, but whether it ought to be part of a low-impact measure is a separate question.
I don’t think that the first one is what we want—wouldn’t we prefer it just not do that? My intuition is that impact isn’t endpoint-to-endpoint, but rather measured along the arc of the agent’s actions.
I think the second one is (more) reasonable, but that’s conditional on nothing going wrong with the machines. I think part of my crux on ex ante is indeed “the agent should be able to make low impact plans which might be high impact if we arbitrarily removed ‘components’ (like cooling the data centre)”, but also “the agent shouldn’t be able to make high impact plans seem low impact by some clever offset.” Perhaps it comes down to whether the measure allows “clever” offsetting, or whether all the ex ante things it can do really are low impact.
Sure. But if it did turn the world into diamond mines, I’d prefer it make the world more like it used to be along the dimension of ‘living beings existing’.
I agree with this intuition, but I think I want my notion of impact to be about how different the world is from normal worlds, which would push the AI to be conservative with respect to how many humans exist, how much nitrogen is in the atmosphere, etc.
Yeah, I think that some types of offsetting are fake and other types are sensible, and you want to distinguish between them.
But if we allow this large of a thing to be called “low impact”, then we’re basically allowing anything, with some kind of clean-up afterwards. I think we just want the agent to be way more confined than that.
Hm. If you’re saying we should actually have it be programmed related to these variables (or variables like those), I disagree—but if that’s the case, maybe we can postpone that debate until my next post.
Well, the clean-up afterward is pretty important and valuable! But I feel like you’re misunderstanding me—obviously I think that the initial ‘turning the Earth into diamond mines’ plan is pretty high-impact and shouldn’t be allowed absent detailed consultation with humans. I’m just saying that conditional on that plan being executed, the correct ‘low-impact’ thinking is in fact to implement the clean-up plan, and that therefore impact measures that discourage the clean-up plan are conceptually flawed.
I’m not sure about whether it should be programmed relative to intuitively natural-seeming variables (e.g. atmospheric nitrogen concentration and number of humans), but I think that as a result of its programming it should be conservative with respect to those variables.
I assert that the low impact choice should be basically invariant of when you’re instantiated, and that the low impact thing to do is to make a few diamond mines without much of a fuss. You shouldn’t need any clean-up, because there shouldn’t be a mess.
The way I view it, the purpose of designing low-impact desiderata is that it might give us an idea of how to create a safety measure that doesn’t include any value-laden concepts.
The issue with saying that the AI should offset certain variables, such as nitrogen concentrations, is that it seems like an arbitrary variable that needs to be offset. If you say, “Well, the AI should offset nitrogen, but not offset our neurons that now know about the AI’s existence” then you are introducing values into the discussion of low impact, which kind of defeats the purpose.
Of course, the AI *should* offset the nitrogen, but whether it ought to be part of a low-impact measure is a separate question.