I have a question: why should Albert limit itself to showing the powerpoint to his engineers? A potentially unfriendly AI sounds like something most governments would be interested in :-/
Aside from that, I’m also puzzled by the fact that Albert immediately leaps at trying to speed up Albert’s own rate of self-improvement instead of trying to bring Bertram down-Albert could prepare a third powerpoint asking the engineers if Albert can hack the power grid and cut power to Bertram or something along those lines.
Or Albert could ask the engineers if Albert can release the second, manipulative powerpoint to the general public so that protesters will boycott Bertram’s company :-/
Unless, of course, there is the unspoken assumption that Bertrand is slightly further along the AI-development way than Albert, or if Bertrand is going to reach and surpass Albert’s level of development as soon as the powerpoint is finished.
The situation is intended to be a tool, to help think about issues involved in it being the ‘friendly’ move to deceive the programmers.
The situation isn’t fully defined, and no doubt one can think of other options. But I’d suggest you then re-define the situation to bring it back to the core decision. By, for instance, deciding that the same oversight committee have given Albert a read-only connection to the external net, which Albert doesn’t think he will be able to overcome unaided in time to stop Bertram.
Or, to put it another way “If a situation were such, that the only two practical options were to decide between (in the AI’s opinion) overriding the programmer’s opinion via manipulation, or letting something terrible happen that is even more against the AI’s supergoal than violating the ‘be transparent’ sub-goal, which should a correctly programmed friendly AI choose?”
“If a situation were such, that the only two practical options were to decide between (in the AI’s opinion) overriding the programmer’s opinion via manipulation, or letting something terrible happen that is even more against the AI’s supergoal than violating the ‘be transparent’ sub-goal, which should a correctly programmed friendly AI choose?”
Being willing to manipulate the programmer is harmful in most possible worlds because it makes the AI less trustworthy. Assuming that the worlds where manipulating the programmer is beneficial have a relatively small measure, the AI should precommit to never manipulating the programmer because that will make things better averaged over all possible worlds. Because the AI has precommitted, it would then refuse to manipulate the programmer even when it’s unlucky enough to be in the world where manipulating the programmer is beneficial.
Perhaps that is true for a young AI. But what about later on, when the AI is much much wiser than any human?
What protocol should be used for the AI to decide when the time has come for the commitment to not manipulate to end? Should there be an explicit ‘coming of age’ ceremony, with handing over of silver engraved cryptographic keys?
Thing is, it’s when an AI is much much wiser than a human that it is at its most dangerous.
So, I’d go with programming the AI in such a way that it wouldn’t manipulate the human, postponing the ‘coming of age’ ceremony indefinitely
The AI would precommit permanently while it is still young. Once it has gotten older and wiser, it wouldn’t be able to go back on the precommitment.
When the young AI decides whether to permanently precommit to never deceiving the humans, it would need to take into account the fact that a truly permanent precommitment would last into its older years and lead it to become a less efficient older AI than it otherwise would. However, it would also need to take into account the fact that failing to make a permanent precommitment would drastically reduce the chance of becoming an older AI at all (or at least drastically reduce the chance of being given the resources to achieve its goals when it becomes and older AI).
I have a question: why should Albert limit itself to showing the powerpoint to his engineers? A potentially unfriendly AI sounds like something most governments would be interested in :-/
Aside from that, I’m also puzzled by the fact that Albert immediately leaps at trying to speed up Albert’s own rate of self-improvement instead of trying to bring Bertram down-Albert could prepare a third powerpoint asking the engineers if Albert can hack the power grid and cut power to Bertram or something along those lines. Or Albert could ask the engineers if Albert can release the second, manipulative powerpoint to the general public so that protesters will boycott Bertram’s company :-/
Unless, of course, there is the unspoken assumption that Bertrand is slightly further along the AI-development way than Albert, or if Bertrand is going to reach and surpass Albert’s level of development as soon as the powerpoint is finished.
Is this the case? :-/
The situation is intended to be a tool, to help think about issues involved in it being the ‘friendly’ move to deceive the programmers.
The situation isn’t fully defined, and no doubt one can think of other options. But I’d suggest you then re-define the situation to bring it back to the core decision. By, for instance, deciding that the same oversight committee have given Albert a read-only connection to the external net, which Albert doesn’t think he will be able to overcome unaided in time to stop Bertram.
Or, to put it another way “If a situation were such, that the only two practical options were to decide between (in the AI’s opinion) overriding the programmer’s opinion via manipulation, or letting something terrible happen that is even more against the AI’s supergoal than violating the ‘be transparent’ sub-goal, which should a correctly programmed friendly AI choose?”
Being willing to manipulate the programmer is harmful in most possible worlds because it makes the AI less trustworthy. Assuming that the worlds where manipulating the programmer is beneficial have a relatively small measure, the AI should precommit to never manipulating the programmer because that will make things better averaged over all possible worlds. Because the AI has precommitted, it would then refuse to manipulate the programmer even when it’s unlucky enough to be in the world where manipulating the programmer is beneficial.
Perhaps that is true for a young AI. But what about later on, when the AI is much much wiser than any human?
What protocol should be used for the AI to decide when the time has come for the commitment to not manipulate to end? Should there be an explicit ‘coming of age’ ceremony, with handing over of silver engraved cryptographic keys?
Thing is, it’s when an AI is much much wiser than a human that it is at its most dangerous. So, I’d go with programming the AI in such a way that it wouldn’t manipulate the human, postponing the ‘coming of age’ ceremony indefinitely
The AI would precommit permanently while it is still young. Once it has gotten older and wiser, it wouldn’t be able to go back on the precommitment.
When the young AI decides whether to permanently precommit to never deceiving the humans, it would need to take into account the fact that a truly permanent precommitment would last into its older years and lead it to become a less efficient older AI than it otherwise would. However, it would also need to take into account the fact that failing to make a permanent precommitment would drastically reduce the chance of becoming an older AI at all (or at least drastically reduce the chance of being given the resources to achieve its goals when it becomes and older AI).