Argues for folk theorem that in general, rational agents will preserve their utility functions during self-optimization.
The Ghandi example works because he was posited with one goal. With multiple competing goals, I’d expect some goals to lose, and having lost, be more likely to lose the next time.
“Utility functions.” Omohundro argues that agents which don’t have utility functions will have to acquire them. I’m not totally sure I believe this is a universal law but I suspect that something like it is true in a lot of cases, for reasons like those above.
Same thing as ‘multiple competing goals’, where those goals are ‘do not be part of a causal chain that leads to the death of others’ and ‘reduce the death of others’.
The Ghandi example works because he was posited with one goal. With multiple competing goals, I’d expect some goals to lose, and having lost, be more likely to lose the next time.
“Utility functions.” Omohundro argues that agents which don’t have utility functions will have to acquire them. I’m not totally sure I believe this is a universal law but I suspect that something like it is true in a lot of cases, for reasons like those above.
And unchanged circumstances. What would Ghandi do when faced with a trolley problem?
Same thing as ‘multiple competing goals’, where those goals are ‘do not be part of a causal chain that leads to the death of others’ and ‘reduce the death of others’.