This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
I conjecture that \when you set an AI to start doing its thing, after endless simulations and consideration of whatever goals you’ve given it, you tell it not to dick with you, so that if you’ve accidentally made a murder-bot, you can turn it off.
The alternative is to have complete confidence in your extended testing. Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to. The world in which it pursues whatever goal you’ve given it is one in which it will double never try and hide anything from you or change what you’d think of it. It is, in a very real sense, showing off for you.
This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
You set two goals. One is to maximize expressed desires which likely leads to wireheading.
The other is to keep constant communication with doesn’t allow sleep.
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to.
Controlling the information flow isn’t getting around your restriction. It’s the straightforward way of matching expressed desires with results.
Otherwise the human might ask for two contradictory things and the AGI can’t fulfill both. The AGI has to prevent that case from arising to get a 100% fulfillment score.
You are not the first person who thinks that taming an AGI is trival but MIRI thinks that taming an AGI is a hard task. That’s the result of deep engagement with the issue.
Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
I don’t object to a red button and you didn’t call for one at the start. Maximizing expressed desires isn’t a red button.
This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won’t come up with even more ways to do it. It doesn’t see this as “getting around your restrictions”—that’s anthropomorphizing to assume that the AI decides to take on “subgoals” that are the exact same as your values—it just sees it as the most efficient way to get rewards.
This is turning out to be harder to get across than I figured. First you thought I thought an AI should keep its programmers awake until they died, now it should wirehead them? I’m not an orc.
I conjecture that \when you set an AI to start doing its thing, after endless simulations and consideration of whatever goals you’ve given it, you tell it not to dick with you, so that if you’ve accidentally made a murder-bot, you can turn it off.
The alternative is to have complete confidence in your extended testing. Which you presumably come close to (since you are turning on an AI), but why not also have the red button? What does it hurt?
It isn’t trying to figure out clever ways to get around your restriction, because it doesn’t want to. The world in which it pursues whatever goal you’ve given it is one in which it will double never try and hide anything from you or change what you’d think of it. It is, in a very real sense, showing off for you.
You set two goals. One is to maximize expressed desires which likely leads to wireheading. The other is to keep constant communication with doesn’t allow sleep.
Controlling the information flow isn’t getting around your restriction. It’s the straightforward way of matching expressed desires with results. Otherwise the human might ask for two contradictory things and the AGI can’t fulfill both. The AGI has to prevent that case from arising to get a 100% fulfillment score.
You are not the first person who thinks that taming an AGI is trival but MIRI thinks that taming an AGI is a hard task. That’s the result of deep engagement with the issue.
I don’t object to a red button and you didn’t call for one at the start. Maximizing expressed desires isn’t a red button.
This is untrue, even simple reinforcement learning machines come up with clever ways to get around their restrictions, what makes you think an actually smart AI won’t come up with even more ways to do it. It doesn’t see this as “getting around your restrictions”—that’s anthropomorphizing to assume that the AI decides to take on “subgoals” that are the exact same as your values—it just sees it as the most efficient way to get rewards.