One of the big problems is that most goals include “optimising the universe in some fashion” as a top outcome for that goal (see Omohundro’s “AI drives” paper). We’re quite good at coding in narrow goals (such as “win the chess match”, “check for signs that any swimmer is drowning”, or whatever), so I’m assuming that we’ve made enough progress to program in some useful goal, but not enough progress to program it in safely. Then (given a few ontological assumptions about physics and the AI’s understanding of physics—these are necessary assumptions, but assuming decent physics ontologies seems the least unlikely ontology we could assume), we can try and construct a reduced impact AI this way.
Plus, the model has many iterations yet until it gets good—and maybe, someday, usable.
One of the big problems is that most goals include “optimising the universe in some fashion” as a top outcome for that goal (see Omohundro’s “AI drives” paper).
I was thinking about why that seems true, despite being so completely counterintuitive regarding existing animal intelligences (ie: us).
Partial possible insight: we conscious people spend a lot of time optimizing our own action-histories, not the external universe. So on some level, an unconscious agent will optimize the universe, whereas a conscious one can be taught to optimize itself (FOOM, yikes), or its own action-history (which more closely approximates human ethics).
Let’s say we could model a semi-conscious agent of the second type mathematically. Would we be able to encode commands like, “Perform narrow job X and do nothing else”?
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
I’ve read Omohundro’s paper, and while I buy the weak form of the argument, I don’t buy the strong form. Or rather, I can’t accept the strong form without a solid model of the algorithm/mind-design I’m looking at.
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
In which case we should be considering building agents that are not expected utility maximizers.
One of the big problems is that most goals include “optimising the universe in some fashion” as a top outcome for that goal (see Omohundro’s “AI drives” paper). We’re quite good at coding in narrow goals (such as “win the chess match”, “check for signs that any swimmer is drowning”, or whatever), so I’m assuming that we’ve made enough progress to program in some useful goal, but not enough progress to program it in safely. Then (given a few ontological assumptions about physics and the AI’s understanding of physics—these are necessary assumptions, but assuming decent physics ontologies seems the least unlikely ontology we could assume), we can try and construct a reduced impact AI this way.
Plus, the model has many iterations yet until it gets good—and maybe, someday, usable.
I was thinking about why that seems true, despite being so completely counterintuitive regarding existing animal intelligences (ie: us).
Partial possible insight: we conscious people spend a lot of time optimizing our own action-histories, not the external universe. So on some level, an unconscious agent will optimize the universe, whereas a conscious one can be taught to optimize itself (FOOM, yikes), or its own action-history (which more closely approximates human ethics).
Let’s say we could model a semi-conscious agent of the second type mathematically. Would we be able to encode commands like, “Perform narrow job X and do nothing else”?
If I were you, I’d read Omohundro’s paper http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf , possibly my critique of it http://lesswrong.com/lw/gyw/ai_prediction_case_study_5_omohundros_ai_drives/ (though that is gratuitous self-advertising!), and then figure out what you think about the arguments.
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
I’ve read Omohundro’s paper, and while I buy the weak form of the argument, I don’t buy the strong form. Or rather, I can’t accept the strong form without a solid model of the algorithm/mind-design I’m looking at.
In which case we should be considering building agents that are not expected utility maximizers.