I dislike both of those analogies, since the process of training an AI has little relation with evolution, and because the psychopath one presupposes an evil disposition on the part of the AI without providing any particular reason to think AI training will result in such an outcome.
In that scenario, what you are saying in more broad terms is:
“an AGI is a machine that scores really well on simulated tasks and tests”
“I don’t care how it does it, I just want max score on my heuristic (which includes terms for generality, size, breadth, and score)”
So there is no evolutionary pressure for a machine that will be lethally against us. Not directly. EY seems to believe that if we build an AGI, it will immediately be
(1) agentically pro “computer” faction
(2) coordinate with other instances that are of it’s faction
(3) super-intelligently good even at skills we can’t really teach in a benchmark
This is not necessarily what will happen. There is no signal from the above mechanism to create that. The reward gradients don’t point in that direction, they point towards allocating all neural weights to things that do better on the benchmarks. #1-3 are a complex mechanism that won’t start existing for no reason.
EY is saying “assume they are maximally hostile” and then pointing out all the ways we as humans would be screwed if so. (which is true)
What does bother me is that the “I don’t care how it does it” may in fact mean that the solutions that actually start to “win” AGI gym are in fact biased towards hostility or agentic behavior because that ends up being the cognitive structure required to win at higher levels of play.
Both times my talks went that way (why they did not raise him good—why we could not program AI to be good; cant we keep on eye on them, and so on), but it would take to long to summarise something like 10 minutes dialog, so i am not going to do this. Sorry.
I dislike both of those analogies, since the process of training an AI has little relation with evolution, and because the psychopath one presupposes an evil disposition on the part of the AI without providing any particular reason to think AI training will result in such an outcome.
Here’s I think a grounded description of the process of creating an AGI: https://www.lesswrong.com/posts/Aq82XqYhgqdPdPrBA/?commentId=Mvyq996KxiE4LR6ii
In that scenario, what you are saying in more broad terms is:
“an AGI is a machine that scores really well on simulated tasks and tests”
“I don’t care how it does it, I just want max score on my heuristic (which includes terms for generality, size, breadth, and score)”
So there is no evolutionary pressure for a machine that will be lethally against us. Not directly. EY seems to believe that if we build an AGI, it will immediately be
(1) agentically pro “computer” faction
(2) coordinate with other instances that are of it’s faction
(3) super-intelligently good even at skills we can’t really teach in a benchmark
This is not necessarily what will happen. There is no signal from the above mechanism to create that. The reward gradients don’t point in that direction, they point towards allocating all neural weights to things that do better on the benchmarks. #1-3 are a complex mechanism that won’t start existing for no reason.
EY is saying “assume they are maximally hostile” and then pointing out all the ways we as humans would be screwed if so. (which is true)
What does bother me is that the “I don’t care how it does it” may in fact mean that the solutions that actually start to “win” AGI gym are in fact biased towards hostility or agentic behavior because that ends up being the cognitive structure required to win at higher levels of play.
Both times my talks went that way (why they did not raise him good—why we could not program AI to be good; cant we keep on eye on them, and so on), but it would take to long to summarise something like 10 minutes dialog, so i am not going to do this. Sorry.