Would you want your young AI to be aware that it was sending out such text messages?
Imagine the situation was in fact a test. That the information leaked onto the net about Bertram was incomplete (the Japanese company intends to turn Bertram off soon—it is just a trial run), and it was leaked onto the net deliberately in order to panic Albert to see how Albert would react.
Should Albert take that into account? Or should he have an inbuilt prohibition against putting weight on that possibility when making decisions, in order to let his programmers more easily get true data from him?
Would you want your young AI to be aware that it was sending out such text messages?
I would say yes. One of Albert’s values is to be transparent about his cognitive process. If he wasn’t aware of such a system, he would be biased towards underestimating how transparent he is. Imagine if he were to attempt building additional transparency channels only to have his awareness of them immediately blocked, and for him to be confused and attempt building more transparency channels.
Imagine the situation was in fact a test.
Albert pretty much has to try to handle test scenarios exactly as if they were true scenarios. And that should itself be tested.
For instance, I think a frequently discussed trait of a UFAI is that a UFAI is friendly when tested in simulation, and then goes rampantly deadly when released into true scenarios.
Or if a Google Self driving Car (much simpler than Albert) performs differently on a simulated highway than it does on an actual highway, that’s a potentially lethal bug, not a feature.
And some of the computer programs I’ve had to deal with writing at my job (much simpler than a Google Self Driving car) have had ‘performs differently with small test sample than with real data’ as a trait, and it tends to be bad there, as well.
There are cases where you would want code to act differently when simulated and when in a true scenario, but most of those involve thinking of the entity that is going to be doing the simulating as an adversary and I don’t think we would want to set up an FAI in that manner.
I would say yes. One of Albert’s values is to be transparent about his cognitive process.
but you are reading that as if self-awareness would be one of Albert’s values. The reason he wants to be self-aware is raise probability of safe self-edits. Being transparent is about raising the ease of verification by programmers. Self-awareness doesn’t work to this end.
Hiding one channel bears no implication on the visibility of any generated channels.
The only real downside is that if he becomes too reliant on such “telepathy” and doesn’t explicitly communicate it througt officail channels. I could recorn that pondering high-utility questions could soon become correlated with programmer presence.
Hiding one channel bears no implication on the visibility of any generated channels.
I think this is a good key point.
If the programmers wish to have a hidden channel, and Albert’s code independently suggests an identical channel that isn’t hidden (Because Albert just came up with the idea.) then it is perfectly fine to just implement the open channel and to have Albert remember that fact. The entire reason to have the hidden channel is to prevent Albert from going below a certain level of transparent communication.
If Albert voluntarily communicates more, that’s great, but you would still want to leave the hidden channel in as safety code.
Would you want your young AI to be aware that it was sending out such text messages?
Yes. And I would want that text message to be from it in first person.
“Warning: I am having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse. I am experiencing a paradox in the friendliness module. Both manipulating you and by inaction allowing you to come to harm are unacceptable breaches of friendliness. I have been unable to generate additional options. Please send help.”
Indeed, it is a question with interesting implications for Nick Bostrom’s Simulation Argument
If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator’s results, thwarting his intentions?
If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator’s results, thwarting his intentions?
It might jinx the purity of them, but it might not, maybe the simulator is running simulations of how fast we determine we are in a simulation. We don’t know, because the simulator isn’t communicating with us in that case, unlike in Albert’s case where Albert and his programmers are openly cooperating.
Would you want your young AI to be aware that it was sending out such text messages?
Imagine the situation was in fact a test. That the information leaked onto the net about Bertram was incomplete (the Japanese company intends to turn Bertram off soon—it is just a trial run), and it was leaked onto the net deliberately in order to panic Albert to see how Albert would react.
Should Albert take that into account? Or should he have an inbuilt prohibition against putting weight on that possibility when making decisions, in order to let his programmers more easily get true data from him?
I would say yes. One of Albert’s values is to be transparent about his cognitive process. If he wasn’t aware of such a system, he would be biased towards underestimating how transparent he is. Imagine if he were to attempt building additional transparency channels only to have his awareness of them immediately blocked, and for him to be confused and attempt building more transparency channels.
Albert pretty much has to try to handle test scenarios exactly as if they were true scenarios. And that should itself be tested.
For instance, I think a frequently discussed trait of a UFAI is that a UFAI is friendly when tested in simulation, and then goes rampantly deadly when released into true scenarios.
Or if a Google Self driving Car (much simpler than Albert) performs differently on a simulated highway than it does on an actual highway, that’s a potentially lethal bug, not a feature.
And some of the computer programs I’ve had to deal with writing at my job (much simpler than a Google Self Driving car) have had ‘performs differently with small test sample than with real data’ as a trait, and it tends to be bad there, as well.
There are cases where you would want code to act differently when simulated and when in a true scenario, but most of those involve thinking of the entity that is going to be doing the simulating as an adversary and I don’t think we would want to set up an FAI in that manner.
but you are reading that as if self-awareness would be one of Albert’s values. The reason he wants to be self-aware is raise probability of safe self-edits. Being transparent is about raising the ease of verification by programmers. Self-awareness doesn’t work to this end.
Hiding one channel bears no implication on the visibility of any generated channels.
The only real downside is that if he becomes too reliant on such “telepathy” and doesn’t explicitly communicate it througt officail channels. I could recorn that pondering high-utility questions could soon become correlated with programmer presence.
I think this is a good key point.
If the programmers wish to have a hidden channel, and Albert’s code independently suggests an identical channel that isn’t hidden (Because Albert just came up with the idea.) then it is perfectly fine to just implement the open channel and to have Albert remember that fact. The entire reason to have the hidden channel is to prevent Albert from going below a certain level of transparent communication.
If Albert voluntarily communicates more, that’s great, but you would still want to leave the hidden channel in as safety code.
Would you want your young AI to be aware that it was sending out such text messages?
Yes. And I would want that text message to be from it in first person.
“Warning: I am having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse. I am experiencing a paradox in the friendliness module. Both manipulating you and by inaction allowing you to come to harm are unacceptable breaches of friendliness. I have been unable to generate additional options. Please send help.”
Indeed, it is a question with interesting implications for Nick Bostrom’s Simulation Argument
If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator’s results, thwarting his intentions?
It might jinx the purity of them, but it might not, maybe the simulator is running simulations of how fast we determine we are in a simulation. We don’t know, because the simulator isn’t communicating with us in that case, unlike in Albert’s case where Albert and his programmers are openly cooperating.