One of the big problems is that most goals include “optimising the universe in some fashion” as a top outcome for that goal (see Omohundro’s “AI drives” paper).
I was thinking about why that seems true, despite being so completely counterintuitive regarding existing animal intelligences (ie: us).
Partial possible insight: we conscious people spend a lot of time optimizing our own action-histories, not the external universe. So on some level, an unconscious agent will optimize the universe, whereas a conscious one can be taught to optimize itself (FOOM, yikes), or its own action-history (which more closely approximates human ethics).
Let’s say we could model a semi-conscious agent of the second type mathematically. Would we be able to encode commands like, “Perform narrow job X and do nothing else”?
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
I’ve read Omohundro’s paper, and while I buy the weak form of the argument, I don’t buy the strong form. Or rather, I can’t accept the strong form without a solid model of the algorithm/mind-design I’m looking at.
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
In which case we should be considering building agents that are not expected utility maximizers.
I was thinking about why that seems true, despite being so completely counterintuitive regarding existing animal intelligences (ie: us).
Partial possible insight: we conscious people spend a lot of time optimizing our own action-histories, not the external universe. So on some level, an unconscious agent will optimize the universe, whereas a conscious one can be taught to optimize itself (FOOM, yikes), or its own action-history (which more closely approximates human ethics).
Let’s say we could model a semi-conscious agent of the second type mathematically. Would we be able to encode commands like, “Perform narrow job X and do nothing else”?
If I were you, I’d read Omohundro’s paper http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf , possibly my critique of it http://lesswrong.com/lw/gyw/ai_prediction_case_study_5_omohundros_ai_drives/ (though that is gratuitous self-advertising!), and then figure out what you think about the arguments.
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
I’ve read Omohundro’s paper, and while I buy the weak form of the argument, I don’t buy the strong form. Or rather, I can’t accept the strong form without a solid model of the algorithm/mind-design I’m looking at.
In which case we should be considering building agents that are not expected utility maximizers.