My opinion—which Eliezer may find quite silly—is that the formalism of utility functions is itself flawed. I think that the key problem with this system is that it utterly resists change. As many people have pointed out, if you have some utility function U, which may be very complicated [“we are godshatter”], then a rational agent whose motavational system strives to maximize U will never ever ever knowingly change U. One might say that a utility maximizer is motivationally frozen. No input from the real world can change what it thinks is important.
In my opinion, the big thing missing in FAI theory is a good understanding of how to set up a motivational architecture which does not have the above property, i.e. a system which allows inputs from the world to change what an agent thinks is important.
I’ve written down a vague idea of how such a thing might work on my blog:
My opinion—which Eliezer may find quite silly—is that the formalism of utility functions is itself flawed. I think that the key problem with this system is that it utterly resists change. As many people have pointed out, if you have some utility function U, which may be very complicated [“we are godshatter”], then a rational agent whose motavational system strives to maximize U will never ever ever knowingly change U. One might say that a utility maximizer is motivationally frozen. No input from the real world can change what it thinks is important.
In my opinion, the big thing missing in FAI theory is a good understanding of how to set up a motivational architecture which does not have the above property, i.e. a system which allows inputs from the world to change what an agent thinks is important.
I’ve written down a vague idea of how such a thing might work on my blog:
http://transhumangoodness.blogspot.com/2007/10/road-to-universal-ethics-universal.html
Of course this is all very preliminary, but I think that we have to get away from the limiting formalism of fixed utility functions.