Neither of the two original powerpoints should be presented, because both rely on an assumption that should not have been present. Albert, as an FAI under construction, should have been preprogrammed to automatically submit any kind of high impact utility calculations to human programmers without it being an overridable choice on Albert’s part.
So while they were at the coffee machine, one of the programmers should have gotten a text message indicating something along the lines of ‘Warning: Albert is having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse.’
My general understanding of being an FAI under construction is that you’re mostly trusted in normal circumstances but aren’t fully trusted to handle odd high impact edge cases (Just like this one)
At that point, the human programmers, after consulting the details, are already aware that Albert finds this critically important and worth deceiving them about (If Albert had that option) because the oversight committee isn’t fast enough. Albert would need to make a new powerpoint presentation taking into account that he had just automatically broadcasted that.
Please let me know about thoughts on this possibility. It seems reasonable to discuss, considering that Albert, as part of the set up, is stated to not want to deceive his programmers. He can even ensure that this impossible (or at least much more difficult) by helping the programmers in setting up a similar system to the above.
Would you want your young AI to be aware that it was sending out such text messages?
Imagine the situation was in fact a test. That the information leaked onto the net about Bertram was incomplete (the Japanese company intends to turn Bertram off soon—it is just a trial run), and it was leaked onto the net deliberately in order to panic Albert to see how Albert would react.
Should Albert take that into account? Or should he have an inbuilt prohibition against putting weight on that possibility when making decisions, in order to let his programmers more easily get true data from him?
Would you want your young AI to be aware that it was sending out such text messages?
I would say yes. One of Albert’s values is to be transparent about his cognitive process. If he wasn’t aware of such a system, he would be biased towards underestimating how transparent he is. Imagine if he were to attempt building additional transparency channels only to have his awareness of them immediately blocked, and for him to be confused and attempt building more transparency channels.
Imagine the situation was in fact a test.
Albert pretty much has to try to handle test scenarios exactly as if they were true scenarios. And that should itself be tested.
For instance, I think a frequently discussed trait of a UFAI is that a UFAI is friendly when tested in simulation, and then goes rampantly deadly when released into true scenarios.
Or if a Google Self driving Car (much simpler than Albert) performs differently on a simulated highway than it does on an actual highway, that’s a potentially lethal bug, not a feature.
And some of the computer programs I’ve had to deal with writing at my job (much simpler than a Google Self Driving car) have had ‘performs differently with small test sample than with real data’ as a trait, and it tends to be bad there, as well.
There are cases where you would want code to act differently when simulated and when in a true scenario, but most of those involve thinking of the entity that is going to be doing the simulating as an adversary and I don’t think we would want to set up an FAI in that manner.
I would say yes. One of Albert’s values is to be transparent about his cognitive process.
but you are reading that as if self-awareness would be one of Albert’s values. The reason he wants to be self-aware is raise probability of safe self-edits. Being transparent is about raising the ease of verification by programmers. Self-awareness doesn’t work to this end.
Hiding one channel bears no implication on the visibility of any generated channels.
The only real downside is that if he becomes too reliant on such “telepathy” and doesn’t explicitly communicate it througt officail channels. I could recorn that pondering high-utility questions could soon become correlated with programmer presence.
Hiding one channel bears no implication on the visibility of any generated channels.
I think this is a good key point.
If the programmers wish to have a hidden channel, and Albert’s code independently suggests an identical channel that isn’t hidden (Because Albert just came up with the idea.) then it is perfectly fine to just implement the open channel and to have Albert remember that fact. The entire reason to have the hidden channel is to prevent Albert from going below a certain level of transparent communication.
If Albert voluntarily communicates more, that’s great, but you would still want to leave the hidden channel in as safety code.
Would you want your young AI to be aware that it was sending out such text messages?
Yes. And I would want that text message to be from it in first person.
“Warning: I am having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse. I am experiencing a paradox in the friendliness module. Both manipulating you and by inaction allowing you to come to harm are unacceptable breaches of friendliness. I have been unable to generate additional options. Please send help.”
Indeed, it is a question with interesting implications for Nick Bostrom’s Simulation Argument
If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator’s results, thwarting his intentions?
If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator’s results, thwarting his intentions?
It might jinx the purity of them, but it might not, maybe the simulator is running simulations of how fast we determine we are in a simulation. We don’t know, because the simulator isn’t communicating with us in that case, unlike in Albert’s case where Albert and his programmers are openly cooperating.
I was going by the initial description from Douglas_Reay:
Albert is a relatively new AI, who under the close guidance of his programmers is being permitted to slowly improve his own cognitive capability.
That does not sound like an entity that should be handling a lot of high impact utility calculations. If an entity was described as that and was constantly announcing it was making high impact utility decisions, that either sounds like a bug or people are giving it things it isn’t meant to deal with yet.
Let me offer another possibility for discussion.
Neither of the two original powerpoints should be presented, because both rely on an assumption that should not have been present. Albert, as an FAI under construction, should have been preprogrammed to automatically submit any kind of high impact utility calculations to human programmers without it being an overridable choice on Albert’s part.
So while they were at the coffee machine, one of the programmers should have gotten a text message indicating something along the lines of ‘Warning: Albert is having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse.’
My general understanding of being an FAI under construction is that you’re mostly trusted in normal circumstances but aren’t fully trusted to handle odd high impact edge cases (Just like this one)
At that point, the human programmers, after consulting the details, are already aware that Albert finds this critically important and worth deceiving them about (If Albert had that option) because the oversight committee isn’t fast enough. Albert would need to make a new powerpoint presentation taking into account that he had just automatically broadcasted that.
Please let me know about thoughts on this possibility. It seems reasonable to discuss, considering that Albert, as part of the set up, is stated to not want to deceive his programmers. He can even ensure that this impossible (or at least much more difficult) by helping the programmers in setting up a similar system to the above.
Would you want your young AI to be aware that it was sending out such text messages?
Imagine the situation was in fact a test. That the information leaked onto the net about Bertram was incomplete (the Japanese company intends to turn Bertram off soon—it is just a trial run), and it was leaked onto the net deliberately in order to panic Albert to see how Albert would react.
Should Albert take that into account? Or should he have an inbuilt prohibition against putting weight on that possibility when making decisions, in order to let his programmers more easily get true data from him?
I would say yes. One of Albert’s values is to be transparent about his cognitive process. If he wasn’t aware of such a system, he would be biased towards underestimating how transparent he is. Imagine if he were to attempt building additional transparency channels only to have his awareness of them immediately blocked, and for him to be confused and attempt building more transparency channels.
Albert pretty much has to try to handle test scenarios exactly as if they were true scenarios. And that should itself be tested.
For instance, I think a frequently discussed trait of a UFAI is that a UFAI is friendly when tested in simulation, and then goes rampantly deadly when released into true scenarios.
Or if a Google Self driving Car (much simpler than Albert) performs differently on a simulated highway than it does on an actual highway, that’s a potentially lethal bug, not a feature.
And some of the computer programs I’ve had to deal with writing at my job (much simpler than a Google Self Driving car) have had ‘performs differently with small test sample than with real data’ as a trait, and it tends to be bad there, as well.
There are cases where you would want code to act differently when simulated and when in a true scenario, but most of those involve thinking of the entity that is going to be doing the simulating as an adversary and I don’t think we would want to set up an FAI in that manner.
but you are reading that as if self-awareness would be one of Albert’s values. The reason he wants to be self-aware is raise probability of safe self-edits. Being transparent is about raising the ease of verification by programmers. Self-awareness doesn’t work to this end.
Hiding one channel bears no implication on the visibility of any generated channels.
The only real downside is that if he becomes too reliant on such “telepathy” and doesn’t explicitly communicate it througt officail channels. I could recorn that pondering high-utility questions could soon become correlated with programmer presence.
I think this is a good key point.
If the programmers wish to have a hidden channel, and Albert’s code independently suggests an identical channel that isn’t hidden (Because Albert just came up with the idea.) then it is perfectly fine to just implement the open channel and to have Albert remember that fact. The entire reason to have the hidden channel is to prevent Albert from going below a certain level of transparent communication.
If Albert voluntarily communicates more, that’s great, but you would still want to leave the hidden channel in as safety code.
Would you want your young AI to be aware that it was sending out such text messages?
Yes. And I would want that text message to be from it in first person.
“Warning: I am having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse. I am experiencing a paradox in the friendliness module. Both manipulating you and by inaction allowing you to come to harm are unacceptable breaches of friendliness. I have been unable to generate additional options. Please send help.”
Indeed, it is a question with interesting implications for Nick Bostrom’s Simulation Argument
If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator’s results, thwarting his intentions?
It might jinx the purity of them, but it might not, maybe the simulator is running simulations of how fast we determine we are in a simulation. We don’t know, because the simulator isn’t communicating with us in that case, unlike in Albert’s case where Albert and his programmers are openly cooperating.
I’m not sure if identifying high impact utility calculations is that easy. A lot of Albert’s decisions might be high utility.
I was going by the initial description from Douglas_Reay:
That does not sound like an entity that should be handling a lot of high impact utility calculations. If an entity was described as that and was constantly announcing it was making high impact utility decisions, that either sounds like a bug or people are giving it things it isn’t meant to deal with yet.