Added: To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite specific and extraordinary initial state—one that meddling dabblers would be rather hard-pressed to accidentally infuse into their poorly designed AI.
There can be AIs trying to be ‘friendly’, as distinct from ‘Friendly’, where I mean by the latter ‘what Eliezer would say an AI should be like’. The pertinent example is a GAI whose only difference from a FAI is that it is programmed not to improve itself beyond specified parameters. This isn’t Friendly. It pulls punches when the world is at stake. That’s evil, but it is still friendly.
While I don’t think using ‘clueless’ like that would be a particularly good way of the GAI expressing itself, I know that I use far more derogatory and usually profane terms to describe those who are too careful, noble or otherwise conservative to do what needs to be done when things are important. They may be competent enough to make a Crippled-Friendly AI but still be expected to shut him down rather than cooperate and at least look into it if he warns them about the 2 week away uFAI threat.
Value is fragile, but any intelligence that doesn’t have purely consequentialist values (makes decisions based off means as well as ends) can definitely be ‘trying to be friendly’.
The possible worlds of which you speak are extremely rare. What plausible sequence of computations within an AI constructed by fools leads it to ring me on the phone? To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite extraordinary initial state—one that fools would be rather hard-pressed to accidentally infuse into their AI.
What plausible sequence of computations within an AI constructed by fools leads it to ring me on the phone?
It’s only implausible because it contains too many extraneous details. An AI could contain an explicit safeguard of the form “ask at least M experts on AI friendliness for permission before exceeding N units of computational power”, for example. Or substitute “changing the world by more than X”, “leaving the box”, or some other condition in place of a computational power threshold. Or the contact might be made by an AI researcher instead of the AI itself.
As of today, your name is highly prominent on the Google results page for “AI friendliness”, and in the academic literature on that topic. Like it or not, that means that a large percentage of AI explosion and near-explosion scenarios will involve you at some point.
Needing your advice is absurd. I mean, it takes more time to for one of us mortals to type a suitable plan than come up with it. The only reason he would contact you is if he needed your assistance:
Value is fragile, but any intelligence that doesn’t have purely consequentialist values (makes decisions based off means as well as ends) can definitely be ‘trying to be friendly’.
Even then, I’m not sure if you are the optimal candidate. How are you at industrial sabotage with, if necessary, terminal force?
“clueless” was shorthand for “not smart enough” I was envisioning BRAGI trying to use you as something similar to a “Last Judge” from CEV, because that was put into its original goal system.
As a matter of fact, I previously came up with a very simple one-sentence
test along these lines which I am not going to post here for obvious reasons.
For what purpose (or circumstance) did you devise such a test?
Would you hang up if “BRAGI” passed your one-sentence test?
To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite specific and extraordinary initial state—one that meddling dabblers would be rather hard-pressed to accidentally infuse into their poorly designed AI.
I assume that you must have devised the test before you arrived at this insight?
Hoax. There are no “AIs trying to be Friendly” with clueless creators. FAI is hard and http://lesswrong.com/lw/y3/value_is_fragile/.
Added: To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite specific and extraordinary initial state—one that meddling dabblers would be rather hard-pressed to accidentally infuse into their poorly designed AI.
There are possible worlds where an AI makes such a phone call.
There can be AIs trying to be ‘friendly’, as distinct from ‘Friendly’, where I mean by the latter ‘what Eliezer would say an AI should be like’. The pertinent example is a GAI whose only difference from a FAI is that it is programmed not to improve itself beyond specified parameters. This isn’t Friendly. It pulls punches when the world is at stake. That’s evil, but it is still friendly.
While I don’t think using ‘clueless’ like that would be a particularly good way of the GAI expressing itself, I know that I use far more derogatory and usually profane terms to describe those who are too careful, noble or otherwise conservative to do what needs to be done when things are important. They may be competent enough to make a Crippled-Friendly AI but still be expected to shut him down rather than cooperate and at least look into it if he warns them about the 2 week away uFAI threat.
Value is fragile, but any intelligence that doesn’t have purely consequentialist values (makes decisions based off means as well as ends) can definitely be ‘trying to be friendly’.
The possible worlds of which you speak are extremely rare. What plausible sequence of computations within an AI constructed by fools leads it to ring me on the phone? To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite extraordinary initial state—one that fools would be rather hard-pressed to accidentally infuse into their AI.
It’s only implausible because it contains too many extraneous details. An AI could contain an explicit safeguard of the form “ask at least M experts on AI friendliness for permission before exceeding N units of computational power”, for example. Or substitute “changing the world by more than X”, “leaving the box”, or some other condition in place of a computational power threshold. Or the contact might be made by an AI researcher instead of the AI itself.
As of today, your name is highly prominent on the Google results page for “AI friendliness”, and in the academic literature on that topic. Like it or not, that means that a large percentage of AI explosion and near-explosion scenarios will involve you at some point.
Needing your advice is absurd. I mean, it takes more time to for one of us mortals to type a suitable plan than come up with it. The only reason he would contact you is if he needed your assistance:
Even then, I’m not sure if you are the optimal candidate. How are you at industrial sabotage with, if necessary, terminal force?
“clueless” was shorthand for “not smart enough” I was envisioning BRAGI trying to use you as something similar to a “Last Judge” from CEV, because that was put into its original goal system.
So, would you hang up on BRAGI?
For what purpose (or circumstance) did you devise such a test?
Would you hang up if “BRAGI” passed your one-sentence test?
I assume that you must have devised the test before you arrived at this insight?
No. I’m not dumb, but I’m not stupid either.