I mean, on the one hand, “reduced-impact AI” is obviously a subject of major interest, since it’s much, much closer to what people actually want and mean when they talk about developing AI. “World-dominationoptimization process in software form” is not what humanity wants from AI; what people naively want is software that can be used to automate tedious jobs away. “Food truck AI”, I would call it, and since we can in fact get people do perform such functions I would figure there exists a software-based mind design that will happily perform a narrow function within a narrow physical and social context and do nothing else (that is, not even save babies from burning buildings, since that’s the job of the human or robotic firefighters).
However, the problem is, you can’t really do math or science about AI without some kind of model. I don’t see a model here. So we might say, “It has a naive utility function of ‘Save all humans’”, but I don’t even see how you’re managing to program that into your initial untested, unsafe, naive AI (especially since current models of AI agents don’t include pre-learned concept ontologies and natural-language understanding that would make “Save all humans” an understandable command!).
On the other hand, you’re certainly doing a good job at least setting a foundation for this. Indeed, we do want some information-theoretic measure or mathematics for allowing us to build minimalistic utility functions describing narrow tasks instead of whole sets of potential universes.
One of the big problems is that most goals include “optimising the universe in some fashion” as a top outcome for that goal (see Omohundro’s “AI drives” paper). We’re quite good at coding in narrow goals (such as “win the chess match”, “check for signs that any swimmer is drowning”, or whatever), so I’m assuming that we’ve made enough progress to program in some useful goal, but not enough progress to program it in safely. Then (given a few ontological assumptions about physics and the AI’s understanding of physics—these are necessary assumptions, but assuming decent physics ontologies seems the least unlikely ontology we could assume), we can try and construct a reduced impact AI this way.
Plus, the model has many iterations yet until it gets good—and maybe, someday, usable.
One of the big problems is that most goals include “optimising the universe in some fashion” as a top outcome for that goal (see Omohundro’s “AI drives” paper).
I was thinking about why that seems true, despite being so completely counterintuitive regarding existing animal intelligences (ie: us).
Partial possible insight: we conscious people spend a lot of time optimizing our own action-histories, not the external universe. So on some level, an unconscious agent will optimize the universe, whereas a conscious one can be taught to optimize itself (FOOM, yikes), or its own action-history (which more closely approximates human ethics).
Let’s say we could model a semi-conscious agent of the second type mathematically. Would we be able to encode commands like, “Perform narrow job X and do nothing else”?
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
I’ve read Omohundro’s paper, and while I buy the weak form of the argument, I don’t buy the strong form. Or rather, I can’t accept the strong form without a solid model of the algorithm/mind-design I’m looking at.
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
In which case we should be considering building agents that are not expected utility maximizers.
I’m kinda… disappointed?
I mean, on the one hand, “reduced-impact AI” is obviously a subject of major interest, since it’s much, much closer to what people actually want and mean when they talk about developing AI. “World-dominationoptimization process in software form” is not what humanity wants from AI; what people naively want is software that can be used to automate tedious jobs away. “Food truck AI”, I would call it, and since we can in fact get people do perform such functions I would figure there exists a software-based mind design that will happily perform a narrow function within a narrow physical and social context and do nothing else (that is, not even save babies from burning buildings, since that’s the job of the human or robotic firefighters).
However, the problem is, you can’t really do math or science about AI without some kind of model. I don’t see a model here. So we might say, “It has a naive utility function of ‘Save all humans’”, but I don’t even see how you’re managing to program that into your initial untested, unsafe, naive AI (especially since current models of AI agents don’t include pre-learned concept ontologies and natural-language understanding that would make “Save all humans” an understandable command!).
On the other hand, you’re certainly doing a good job at least setting a foundation for this. Indeed, we do want some information-theoretic measure or mathematics for allowing us to build minimalistic utility functions describing narrow tasks instead of whole sets of potential universes.
Maybe I’m just failing to get it.
One of the big problems is that most goals include “optimising the universe in some fashion” as a top outcome for that goal (see Omohundro’s “AI drives” paper). We’re quite good at coding in narrow goals (such as “win the chess match”, “check for signs that any swimmer is drowning”, or whatever), so I’m assuming that we’ve made enough progress to program in some useful goal, but not enough progress to program it in safely. Then (given a few ontological assumptions about physics and the AI’s understanding of physics—these are necessary assumptions, but assuming decent physics ontologies seems the least unlikely ontology we could assume), we can try and construct a reduced impact AI this way.
Plus, the model has many iterations yet until it gets good—and maybe, someday, usable.
I was thinking about why that seems true, despite being so completely counterintuitive regarding existing animal intelligences (ie: us).
Partial possible insight: we conscious people spend a lot of time optimizing our own action-histories, not the external universe. So on some level, an unconscious agent will optimize the universe, whereas a conscious one can be taught to optimize itself (FOOM, yikes), or its own action-history (which more closely approximates human ethics).
Let’s say we could model a semi-conscious agent of the second type mathematically. Would we be able to encode commands like, “Perform narrow job X and do nothing else”?
If I were you, I’d read Omohundro’s paper http://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf , possibly my critique of it http://lesswrong.com/lw/gyw/ai_prediction_case_study_5_omohundros_ai_drives/ (though that is gratuitous self-advertising!), and then figure out what you think about the arguments.
I’d say the main reason it’s so counterintuitive is that this behaviour exists strongly for expected utility maximisers—and we’re so unbelievably far from being that ourselves.
I’ve read Omohundro’s paper, and while I buy the weak form of the argument, I don’t buy the strong form. Or rather, I can’t accept the strong form without a solid model of the algorithm/mind-design I’m looking at.
In which case we should be considering building agents that are not expected utility maximizers.