You can mess people up quite easily will still satisfying their expressed desires. The AGI can also talk the programmers into whatever position it considers reasonable.
“never let communication with Programmer lapse”
You just forbade the AGI from allowing the programmer to sleep.
Sure, it can mindclub people, but it’ll only do that if it wants to, and it will only want to if they tell it to. AI should want to stay in the box.
I guess...the “communication lapse” thing was unclear? I didn’t mean that the human must always be approving the AI, I meant that it must always be ready/able to receive the programmer’s input. In case it starts to turn everyone into paperclips there’s a hard “never take action to restrict us from instructing you/ always obey our instructions” clause.
Sure, it can mindclub people, but it’ll only do that if it wants to, and it will only want to if they tell it to.
No, an AGI is complex it has millions of subgoals.
“never take action to restrict us from instructing you/ always obey our instructions” clause.
Putting the programmer in a closed enviroment where he’s wireheaded doesn’t technically restrict the programmer from instructing the AGI. It’s just that the programmers mind is occupied differently.
That’s what you tell the AGI to do. It’s easiest to satisfy the programmer’s expressed desires if the AGI closes him of from the outside world and controls the expressed desires of the programmer.
“never take action to restrict us from instructing you/always obey our instructions” clause.
Also, anything that restricts the AI’s power would restrict its ability to obey instructions. An attempt by the programmer to shut down the AI would result in a contradiction, which could be resolved in all sorts of interesting ways.
This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
I conjecture that \when you set an AI to start doing its thing, after endless simulations and consideration of whatever goals you’ve given it, you tell it not to dick with you, so that if you’ve accidentally made a murder-bot, you can turn it off.
The alternative is to have complete confidence in your extended testing. Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to. The world in which it pursues whatever goal you’ve given it is one in which it will double never try and hide anything from you or change what you’d think of it. It is, in a very real sense, showing off for you.
This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
You set two goals. One is to maximize expressed desires which likely leads to wireheading.
The other is to keep constant communication with doesn’t allow sleep.
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to.
Controlling the information flow isn’t getting around your restriction. It’s the straightforward way of matching expressed desires with results.
Otherwise the human might ask for two contradictory things and the AGI can’t fulfill both. The AGI has to prevent that case from arising to get a 100% fulfillment score.
You are not the first person who thinks that taming an AGI is trival but MIRI thinks that taming an AGI is a hard task. That’s the result of deep engagement with the issue.
Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
I don’t object to a red button and you didn’t call for one at the start. Maximizing expressed desires isn’t a red button.
This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won’t come up with even more ways to do it. It doesn’t see this as “getting around your restrictions”—that’s anthropomorphizing to assume that the AI decides to take on “subgoals” that are the exact same as your values—it just sees it as the most efficient way to get rewards.
You can mess people up quite easily will still satisfying their expressed desires. The AGI can also talk the programmers into whatever position it considers reasonable.
You just forbade the AGI from allowing the programmer to sleep.
Sure, it can mindclub people, but it’ll only do that if it wants to, and it will only want to if they tell it to. AI should want to stay in the box.
I guess...the “communication lapse” thing was unclear? I didn’t mean that the human must always be approving the AI, I meant that it must always be ready/able to receive the programmer’s input. In case it starts to turn everyone into paperclips there’s a hard “never take action to restrict us from instructing you/ always obey our instructions” clause.
No, an AGI is complex it has millions of subgoals.
Putting the programmer in a closed enviroment where he’s wireheaded doesn’t technically restrict the programmer from instructing the AGI. It’s just that the programmers mind is occupied differently.
That’s what you tell the AGI to do. It’s easiest to satisfy the programmer’s expressed desires if the AGI closes him of from the outside world and controls the expressed desires of the programmer.
Also, anything that restricts the AI’s power would restrict its ability to obey instructions. An attempt by the programmer to shut down the AI would result in a contradiction, which could be resolved in all sorts of interesting ways.
This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
I conjecture that \when you set an AI to start doing its thing, after endless simulations and consideration of whatever goals you’ve given it, you tell it not to dick with you, so that if you’ve accidentally made a murder-bot, you can turn it off.
The alternative is to have complete confidence in your extended testing. Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to. The world in which it pursues whatever goal you’ve given it is one in which it will double never try and hide anything from you or change what you’d think of it. It is, in a very real sense, showing off for you.
You set two goals. One is to maximize expressed desires which likely leads to wireheading. The other is to keep constant communication with doesn’t allow sleep.
Controlling the information flow isn’t getting around your restriction. It’s the straightforward way of matching expressed desires with results. Otherwise the human might ask for two contradictory things and the AGI can’t fulfill both. The AGI has to prevent that case from arising to get a 100% fulfillment score.
You are not the first person who thinks that taming an AGI is trival but MIRI thinks that taming an AGI is a hard task. That’s the result of deep engagement with the issue.
I don’t object to a red button and you didn’t call for one at the start. Maximizing expressed desires isn’t a red button.
This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won’t come up with even more ways to do it. It doesn’t see this as “getting around your restrictions”—that’s anthropomorphizing to assume that the AI decides to take on “subgoals” that are the exact same as your values—it just sees it as the most efficient way to get rewards.