You are assuming that human morality is something which can be specified by a set of exact decision theory equations, or at least roughly approximated by such.
No I’m not. At least, it’s not humanly possible. An AI could work out a human’s implicit utility function, but it would be extremely long and complicated.
There are cases (like turbulence) where we know the underlying governing equations, but still can’t make predictions beyond a certain threshold.
Human morality is a difficult thing to predict. If you build your AI the same way, it will also be difficult to predict. They will not end up being the same.
If human morality is too complicated for an AI to understand, then let it average over the possibilities. Or at least let it guess. Don’t tell it to come up with something on its own. That will not end well.
I’m not sure how you came to that conclusion
It was the line:
what we normally think of as human morals is not very compressed, so specifying many of them inconsistently and leaving a few out would still have a high likelihood of resulting in an acceptable moral value function.
In order for this to work, whatever statements we make about our morality must have more information content then morality itself. That is, we not only describe all of our morality, we repeat ourselves several times. Sort of like how if you want to describe gravity, and you give the position of a falling ball at fifty points in time, there’s significantly more information in there than you need to describe gravity, so you can work out the law of gravity from just that data.
If our morality is complicated, then specifying many of them approximately would result in the AI finding some point in morality space that’s a little off in every area we specified, and completely off in all the areas we forgot about.
If that were true, MIRI’s current strategy is a complete waste of time
Their strategy is not to figure out human morality and explicitly program that into an AI. It’s to find some way of saying “figure out human morality and do that” that’s not rife with loopholes. Once they have that down, the AI can emulate a variety of human minds, or do whatever it is it needs to do.
No I’m not. At least, it’s not humanly possible. An AI could work out a human’s implicit utility function, but it would be extremely long and complicated.
Human morality is a difficult thing to predict. If you build your AI the same way, it will also be difficult to predict. They will not end up being the same.
If human morality is too complicated for an AI to understand, then let it average over the possibilities. Or at least let it guess. Don’t tell it to come up with something on its own. That will not end well.
It was the line:
In order for this to work, whatever statements we make about our morality must have more information content then morality itself. That is, we not only describe all of our morality, we repeat ourselves several times. Sort of like how if you want to describe gravity, and you give the position of a falling ball at fifty points in time, there’s significantly more information in there than you need to describe gravity, so you can work out the law of gravity from just that data.
If our morality is complicated, then specifying many of them approximately would result in the AI finding some point in morality space that’s a little off in every area we specified, and completely off in all the areas we forgot about.
Their strategy is not to figure out human morality and explicitly program that into an AI. It’s to find some way of saying “figure out human morality and do that” that’s not rife with loopholes. Once they have that down, the AI can emulate a variety of human minds, or do whatever it is it needs to do.