A child will eventually learn the independence. It’s a very different thing to hardcode assume the independence from the start. Values are also complex and there might not be a natural line on what is natural to care about. If I take the the morality of the agent to be independent of it’s decisions really far it will mean the AI believes it’s a good boy or a bad boy already no matter what it does (and it just doesn’t know which). This seems to be opposite what moral behavior is about. There is also no telling that the valued actions need to simple concepts. After all cake is good unless diabetes and death might not be all bad if euthanasia.
Also if I am inconsistent or plain out make an error on assigning the labels I would rather have the AI have a dialogue back to me about it than make silently a really convoluted utility function.
We could off course think that a value loader is a two step process where the values loaded are not taken as is as the actual utility function but modified to make sense in a “non-cheating way”. That is the actual utility function always contains terms about moral fairness even if not explicitly input. This reduces the amount of values softcoded as meta-morality is hardcoded. But this layering also solves the original problem. The agent might evaluate the new situation with lower expected utility and would choose not to go there—if it had a choice. But by layering we are taking the choice away so the wrong valuation doesn’t matter. By enforcing the hand-inputted conservation of moral evidence we tie the AIs hand to form a moral strategy. “We know better” before hand.
I was more getting to that it narrows down the problem instead of generalising it. It reduces the responsibilities of the AI and widens those of humans. If you solved this problem you would only get up to the most virtous human (which isn’t exactly bad). Going beyond would require ethics competency that would have to be added as we are tying it’s hands in this department.
I take the point in practice, but there’s no reason we couldn’t design something to follow a path towards ultra-ethicshood that had the conservation property. For instance, if we could implement “as soon as you know your morals would change, then change them”, this would give us a good part of the “conservation” law.
A child will eventually learn the independence. It’s a very different thing to hardcode assume the independence from the start. Values are also complex and there might not be a natural line on what is natural to care about. If I take the the morality of the agent to be independent of it’s decisions really far it will mean the AI believes it’s a good boy or a bad boy already no matter what it does (and it just doesn’t know which). This seems to be opposite what moral behavior is about. There is also no telling that the valued actions need to simple concepts. After all cake is good unless diabetes and death might not be all bad if euthanasia.
Also if I am inconsistent or plain out make an error on assigning the labels I would rather have the AI have a dialogue back to me about it than make silently a really convoluted utility function.
We could off course think that a value loader is a two step process where the values loaded are not taken as is as the actual utility function but modified to make sense in a “non-cheating way”. That is the actual utility function always contains terms about moral fairness even if not explicitly input. This reduces the amount of values softcoded as meta-morality is hardcoded. But this layering also solves the original problem. The agent might evaluate the new situation with lower expected utility and would choose not to go there—if it had a choice. But by layering we are taking the choice away so the wrong valuation doesn’t matter. By enforcing the hand-inputted conservation of moral evidence we tie the AIs hand to form a moral strategy. “We know better” before hand.
An interesting point, hinting that my approach at moral updating ( http://lesswrong.com/lw/jxa/proper_value_learning_through_indifference/ ) may be better than I supposed.
I was more getting to that it narrows down the problem instead of generalising it. It reduces the responsibilities of the AI and widens those of humans. If you solved this problem you would only get up to the most virtous human (which isn’t exactly bad). Going beyond would require ethics competency that would have to be added as we are tying it’s hands in this department.
I take the point in practice, but there’s no reason we couldn’t design something to follow a path towards ultra-ethicshood that had the conservation property. For instance, if we could implement “as soon as you know your morals would change, then change them”, this would give us a good part of the “conservation” law.