A S design that satisfies M(u-v) more than default is one where Δ(u-v)>0, or Δu>Δv (1). A S design that satisfies M(εu+v) more than default is one where Δ(εu+v)>0, or εΔu>-Δv (2).
That sounds like trying to extract useful work out of ignorance!
I am trying to extract work from ignorance. The same way that I did with “resource gathering”. An AI that is ignorant of its utility will try and gather power and resources, and preserve flexibility—that’s a kind of behaviour you can get mainly from an ignorant AI.
all possible vs such that Δv≥0
Unlikely, because I’d generally design with equal chances of v and -v (or at least comparable chances).
This is sort of an anti-first law, in that the agent will choose inaction or pursuing its duties instead of helping out others—but only when it helps too much!
We don’t know that v is nice—in fact, it’s likely nasty. With -v also being nasty. So we don’t want either of them to be strongly maximised, in fact.
What happens here is that as Δu increases and S(u) uses up resources, the probability that Δv will remain bounded (in plus or minus) decreases strongly. So the best way of keeping Δv bounded is not to burn up much resources towards Δu.
If we don’t have the right reference class to begin with
I’m assuming we don’t. And it’s much easier to define a category V such that we are fairly confident that there is a good utility/reference class in V, than to pick it out. But reduced impact kind of behaviour might even help if we cannot define V. Even if we can’t say exactly that some humans are morally valuable, killing a lot of humans is likely to be disruptive for a lot of utility functions (in a positive or negative direction), so we get reduced impact from that.
Unlikely, because I’d generally design with equal chances of v and -v (or at least comparable chances).
We don’t know that v is nice—in fact, it’s likely nasty. With -v also being nasty. So we don’t want either of them to be strongly maximised, in fact.
I think we have different intuitions about what it means to estimate Δv over an uncertain set / the constraints we’re putting on v. I’m imagining integrating Δvdv, and so if there is any v whose negative is also in the set with the same probability, then the two will cancel out completely, neither of them affecting the end result.
It seems to me like the property you want comes from having non-negative vs, which might have opposite inputs. That is, instead of v_1 being “Bob’s utility function” and v_2 being “Bob’s utility function, with a minus sign in front,” v_3 would be “positive changes to Bob’s utility function that I caused” and v_4 would be “negative changes to Bob’s utility function that I caused.” If we assign equal weight to only v_1 and v_2, it looks like there is no change to Bob’s utility function that will impact our decision-making, since when we integrate over our uncertainty the two balance out.
We’ve defined v_3 and v_4 to be non-negative, though. If we pull Bob’s sweater to rescue him from the speeding truck, v_3 is positive (because we’ve saved Bob) and v_4 is positive (because we’ve damaged his sweater). So we’ll look for plans that reduce both (which is most easily done by not intervening, and letting Bob be hit by the truck). If we want the agent to save Bob, we need to include that in u, and if we do so it’ll try to save Bob in the way with minimal other effects.
What happens here is that as Δu increases and S(u) uses up resources, the probability that Δv will remain bounded (in plus or minus) decreases strongly. So the best way of keeping Δv bounded is not to burn up much resources towards Δu.
Agreed that an AI that tries to maximize “profit” instead of “revenue” is the best place to look for a reduced impact AI (I also think that reduced impact AI is the best name for this concept, btw). I don’t think I’m seeing yet how this plan is a good representation of “cost.” It seems that in order to produce minimal activity, we need to put effort into balancing our weights on possible vs such that inaction looks better than action.
(I think this is easier to formulate in terms of effort spent than consequences wrought, but clearly we want to measure “inaction” in terms of consequences, not actions. It might be very low cost for the RIAI to send a text message to someone, but then that someone might do a lot of things that impact a lot of people and preferences, and we would rather if the RIAI just didn’t send the message.)
And it’s much easier to define a category V such that we are fairly confident that there is a good utility/reference class in V, than to pick it out.
It seems to me that any aggregation procedure over a category V is equivalent to a particular utility v*, and so the implausibility that a particular utility function v’ is the right one to pick applies as strongly to v*. For this to not be the case, we need to know something nontrivial about our category V or our aggregation procedure. (I also think we can, given an aggregation procedure or a category, work back from v’ to figure out at least one implied category or aggregation procedure given some benign assumptions.)
The point here is that M(u-v) might not know what v is, but M(εu+v) certainly does, and this is not the same as maximising an unknown utility function.
The point here is that M(u-v) might not know what v is, but M(εu+v) certainly does, and this is not the same as maximising an unknown utility function.
Ah, okay. I think I see better what you’re getting at. My intuition is that there’s a mapping to minimization of a reasonable aggregation of the set of non-negative utilities, but I think I should actually work through some examples before I make any long comments.
Do you disagree with my description of the “resource gathering agent”:
I don’t think I had read that article until now, but no objections come to mind.
My intuition is that there’s a mapping to minimization of a reasonable aggregation of the set of non-negative utilities
That would be useful to know, if you can find examples. Especially ones where all v and -v have the same probability (which is my current favourite requirement in this area).
I am trying to extract work from ignorance. The same way that I did with “resource gathering”. An AI that is ignorant of its utility will try and gather power and resources, and preserve flexibility—that’s a kind of behaviour you can get mainly from an ignorant AI.
Unlikely, because I’d generally design with equal chances of v and -v (or at least comparable chances).
We don’t know that v is nice—in fact, it’s likely nasty. With -v also being nasty. So we don’t want either of them to be strongly maximised, in fact.
What happens here is that as Δu increases and S(u) uses up resources, the probability that Δv will remain bounded (in plus or minus) decreases strongly. So the best way of keeping Δv bounded is not to burn up much resources towards Δu.
I’m assuming we don’t. And it’s much easier to define a category V such that we are fairly confident that there is a good utility/reference class in V, than to pick it out. But reduced impact kind of behaviour might even help if we cannot define V. Even if we can’t say exactly that some humans are morally valuable, killing a lot of humans is likely to be disruptive for a lot of utility functions (in a positive or negative direction), so we get reduced impact from that.
I think we have different intuitions about what it means to estimate Δv over an uncertain set / the constraints we’re putting on v. I’m imagining integrating Δvdv, and so if there is any v whose negative is also in the set with the same probability, then the two will cancel out completely, neither of them affecting the end result.
It seems to me like the property you want comes from having non-negative vs, which might have opposite inputs. That is, instead of v_1 being “Bob’s utility function” and v_2 being “Bob’s utility function, with a minus sign in front,” v_3 would be “positive changes to Bob’s utility function that I caused” and v_4 would be “negative changes to Bob’s utility function that I caused.” If we assign equal weight to only v_1 and v_2, it looks like there is no change to Bob’s utility function that will impact our decision-making, since when we integrate over our uncertainty the two balance out.
We’ve defined v_3 and v_4 to be non-negative, though. If we pull Bob’s sweater to rescue him from the speeding truck, v_3 is positive (because we’ve saved Bob) and v_4 is positive (because we’ve damaged his sweater). So we’ll look for plans that reduce both (which is most easily done by not intervening, and letting Bob be hit by the truck). If we want the agent to save Bob, we need to include that in u, and if we do so it’ll try to save Bob in the way with minimal other effects.
Agreed that an AI that tries to maximize “profit” instead of “revenue” is the best place to look for a reduced impact AI (I also think that reduced impact AI is the best name for this concept, btw). I don’t think I’m seeing yet how this plan is a good representation of “cost.” It seems that in order to produce minimal activity, we need to put effort into balancing our weights on possible vs such that inaction looks better than action.
(I think this is easier to formulate in terms of effort spent than consequences wrought, but clearly we want to measure “inaction” in terms of consequences, not actions. It might be very low cost for the RIAI to send a text message to someone, but then that someone might do a lot of things that impact a lot of people and preferences, and we would rather if the RIAI just didn’t send the message.)
It seems to me that any aggregation procedure over a category V is equivalent to a particular utility v*, and so the implausibility that a particular utility function v’ is the right one to pick applies as strongly to v*. For this to not be the case, we need to know something nontrivial about our category V or our aggregation procedure. (I also think we can, given an aggregation procedure or a category, work back from v’ to figure out at least one implied category or aggregation procedure given some benign assumptions.)
Do you disagree with my description of the “resource gathering agent”: http://lesswrong.com/r/discussion/lw/luo/resource_gathering_and_precorriged_agents/
The point here is that M(u-v) might not know what v is, but M(εu+v) certainly does, and this is not the same as maximising an unknown utility function.
Ah, okay. I think I see better what you’re getting at. My intuition is that there’s a mapping to minimization of a reasonable aggregation of the set of non-negative utilities, but I think I should actually work through some examples before I make any long comments.
I don’t think I had read that article until now, but no objections come to mind.
That would be useful to know, if you can find examples. Especially ones where all v and -v have the same probability (which is my current favourite requirement in this area).