Has there been effort into finding a “least acceptable” value function, one that we hope would not annihilate the universe or turn it degenerate, even if the outcome itself is not ideal? My example would be to try to teach a superintelligence to value all other agents facing surmountable challenges in a variety of environments. The degeneracy condition of this, is if it does not value the real world, will simply simulate all agents in a zoo. However, if the simulations are of faithful fidelity, maybe that’s not literally the worst thing. Plus, the zoo, to truly be a good test of the agents, would approach being invisible.
This doesn’t select for humanlike minds. You don’t want vast numbers of Ataribots similar to current RL, playing games like pong and pac-man. (And a trillion other autogenerated games sampled from the same distribution)
Even if you could somehow ensure it was human minds playing these games, the line between a fun game and total boredom is complex and subtle.
That is a very fair criticism. I didn’t mean to imply this is something I was very confident in, but was interested in for three reasons:
1) This value function aside, is this a workable strategy, or is there a solid reason for suspecting the solution is all-or-nothing? Is it reasonable to ‘look for’ our values with human effort, or does this have to be something searched for using algorithms? 2) It sort of gives a flavor to what’s important in life. Of course the human value function will be a complicated mix of different sensory inputs, reproduction, and goal seeking, but I felt like there’s a kernel in there where curiosity is one of our biggest drivers. There was a post here a while back about someone’s child being motivated first and foremost by curiosity.
3) An interesting thought occurs to me that, supposing we do create a deferential superintelligence. If it’s cognitive capacities far outpace that of humans, does that mean the majority of consciousness in the universe is from the AI? If so, is it strange to think, is it happy? What is it like to be a god with the values of a child? Maybe I should make a separate comment about this.
The obvious option in this class is to try to destroy the world in a way that doesn’t send out an AI to eat the lightcone that might possibly contain aliens who could have a better shot.
Has there been effort into finding a “least acceptable” value function, one that we hope would not annihilate the universe or turn it degenerate, even if the outcome itself is not ideal? My example would be to try to teach a superintelligence to value all other agents facing surmountable challenges in a variety of environments. The degeneracy condition of this, is if it does not value the real world, will simply simulate all agents in a zoo. However, if the simulations are of faithful fidelity, maybe that’s not literally the worst thing. Plus, the zoo, to truly be a good test of the agents, would approach being invisible.
This doesn’t select for humanlike minds. You don’t want vast numbers of Ataribots similar to current RL, playing games like pong and pac-man. (And a trillion other autogenerated games sampled from the same distribution)
Even if you could somehow ensure it was human minds playing these games, the line between a fun game and total boredom is complex and subtle.
That is a very fair criticism. I didn’t mean to imply this is something I was very confident in, but was interested in for three reasons:
1) This value function aside, is this a workable strategy, or is there a solid reason for suspecting the solution is all-or-nothing? Is it reasonable to ‘look for’ our values with human effort, or does this have to be something searched for using algorithms?
2) It sort of gives a flavor to what’s important in life. Of course the human value function will be a complicated mix of different sensory inputs, reproduction, and goal seeking, but I felt like there’s a kernel in there where curiosity is one of our biggest drivers. There was a post here a while back about someone’s child being motivated first and foremost by curiosity.
3) An interesting thought occurs to me that, supposing we do create a deferential superintelligence. If it’s cognitive capacities far outpace that of humans, does that mean the majority of consciousness in the universe is from the AI? If so, is it strange to think, is it happy? What is it like to be a god with the values of a child? Maybe I should make a separate comment about this.
At the moment, we don’t know how to make an AI that does something simple like making lots of diamonds.
It seems plausible that making an AI that copies human values is easier than hardcoding even a crude approximation to human values. Or maybe not.
The obvious option in this class is to try to destroy the world in a way that doesn’t send out an AI to eat the lightcone that might possibly contain aliens who could have a better shot.
I am really not a fan of this option.