For this reason, reinforcement learning is a good mathematical model to use when addressing how to create intelligence, but a really dismal model for trying to create friendiness.
I don’t think that follows at all. Wireheading is just as much a fialure of intelligence as of friendliness.
From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn’t result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.
From the human point of view, yes, wireheading is a failure of intelligence. This is because we humans possess a peculiar capability I’ve not seen discussed in the Rational Agent or AI literature: we use actual rewards and punishments received in moral contexts as training examples to infer a broad code of morality. Wireheading thus represents a failure to abide by that broad, inferred code.
It’s a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.
From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn’t result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.
You seem rather sure of that. That isn’t a failure mode seen in real-world AIs , oir human drug addicts (etc) for that matter.
It’s a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.
Maybe figuring out how it is done would be easier than solving morality mathematically. It’s an alternative, anyway.
We have reason to believe current AIXI-type models will wirehead if given the opportunity.
Maybe figuring out how it is done would be easier than solving morality mathematically. It’s an alternative, anyway.
I would agree with this if and only if we can also figure out a way to hardwire in constraints like, “Don’t do anything a human would consider harmful to themselves or humanity.” But at that point we’re already talking about animal-like Robot Worker AIs rather than Software Superoptimizers (the AIXI/Goedel Machine/LessWrong model of AGI, whose mathematics we understand better).
I know wire heading is a known failure mode. I meant we don’t see many evil genius wire headers. If you can delay gratification well enough to acquire the skills to be a world dominator, you are not exactly a wire header at all.
Are you aiming for a 100% solution, or just reasonable safety?
Sorry, I had meant an AI agent would both wirehead and world-dominate. It would calculate the minimum amount of resources to devote to world domination, enact that policy, and then use the rest of its resources to wirehead.
Has that been proven? Why wouldn’t it want to get to the bliss of wire head heaven as soon as possible? How does it motivate itself in the meantime? Why would a wire header also be a gratification delayed? Why makeelaborate plans for a future self, when it could just rewrite itself to be a happ in the the the present ?
Well-designed AIs don’t run on gratification, they run on planning. While it is theoretically possible to write an optimizer-type AI that cares only about the immediate reward in the next moment, and is completely neutral about human researchers shutting it down afterward, it’s not exactly trivial.
If I recall correctly, AIXI itself tries to optimize the total integrated reward from t = 0 to infinity, but it should be straightforward to introduce a cutoff after which point it doesn’t care.
But even with a planning horizon like that you have the problem that the AI wants to guarantee that it gets the maximum amount of reward. This means stopping the researchers in the lab from turning it off before its horizon runs out. As you reduce the length of the horizon (treating it as a parameter of the program), the AI has less time to think, in effect, and creates less and less elaborate defenses for its future self, until you set it to zero, at which point the AI won’t do anything at all (or act completely randomly, more likely).
This isn’t much of a solution though, because an AI with a really short planning horizon isn’t very useful in practice, and is still pretty dangerous if someone trying to use one thinks “this AI isn’t very effective, what if I let it plan further ahead” and increases the cutoff to a really huge value and the AI takes over the world again. There might be other solutions, but most of them would share that last caveat.
I don’t think that follows at all. Wireheading is just as much a fialure of intelligence as of friendliness.
From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn’t result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.
From the human point of view, yes, wireheading is a failure of intelligence. This is because we humans possess a peculiar capability I’ve not seen discussed in the Rational Agent or AI literature: we use actual rewards and punishments received in moral contexts as training examples to infer a broad code of morality. Wireheading thus represents a failure to abide by that broad, inferred code.
It’s a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.
You seem rather sure of that. That isn’t a failure mode seen in real-world AIs , oir human drug addicts (etc) for that matter.
Maybe figuring out how it is done would be easier than solving morality mathematically. It’s an alternative, anyway.
We have reason to believe current AIXI-type models will wirehead if given the opportunity.
I would agree with this if and only if we can also figure out a way to hardwire in constraints like, “Don’t do anything a human would consider harmful to themselves or humanity.” But at that point we’re already talking about animal-like Robot Worker AIs rather than Software Superoptimizers (the AIXI/Goedel Machine/LessWrong model of AGI, whose mathematics we understand better).
I know wire heading is a known failure mode. I meant we don’t see many evil genius wire headers. If you can delay gratification well enough to acquire the skills to be a world dominator, you are not exactly a wire header at all.
Are you aiming for a 100% solution, or just reasonable safety?
Sorry, I had meant an AI agent would both wirehead and world-dominate. It would calculate the minimum amount of resources to devote to world domination, enact that policy, and then use the rest of its resources to wirehead.
Has that been proven? Why wouldn’t it want to get to the bliss of wire head heaven as soon as possible? How does it motivate itself in the meantime? Why would a wire header also be a gratification delayed? Why makeelaborate plans for a future self, when it could just rewrite itself to be a happ in the the the present ?
My advice would be to read the relevant papers.
http://www.idsia.ch/~ring/AGI-2011/Paper-B.pdf
Well-designed AIs don’t run on gratification, they run on planning. While it is theoretically possible to write an optimizer-type AI that cares only about the immediate reward in the next moment, and is completely neutral about human researchers shutting it down afterward, it’s not exactly trivial.
If I recall correctly, AIXI itself tries to optimize the total integrated reward from
t = 0
to infinity, but it should be straightforward to introduce a cutoff after which point it doesn’t care.But even with a planning horizon like that you have the problem that the AI wants to guarantee that it gets the maximum amount of reward. This means stopping the researchers in the lab from turning it off before its horizon runs out. As you reduce the length of the horizon (treating it as a parameter of the program), the AI has less time to think, in effect, and creates less and less elaborate defenses for its future self, until you set it to zero, at which point the AI won’t do anything at all (or act completely randomly, more likely).
This isn’t much of a solution though, because an AI with a really short planning horizon isn’t very useful in practice, and is still pretty dangerous if someone trying to use one thinks “this AI isn’t very effective, what if I let it plan further ahead” and increases the cutoff to a really huge value and the AI takes over the world again. There might be other solutions, but most of them would share that last caveat.