zulupineapple comments on Against Instrumental Convergence

zulupineapple 28 Jan 2018 9:14 UTC
3 points
AlphaGo Zero was not a (uniformish) random search on Go programs, and humans were not a (uniformish) random search on creatures.
I’d classify both of those as random programs though. AlphaZero is a random program from the set of programs that are good at playing go (and that satisfy some structure set by the creators). Humans are random machines from the set of machines that are good at not dying. The searches aren’t uniform, of course, but they are not intentional enough for it to matter.
In particular, AlphaZero was not selected in such way that exhibiting instrumental convergence would benefit it, and therefore it most likely does not exhibit instrumental convergence. Suppose there was a random modification to AlphaZero that would make it try to get more computational resources, and that this modification was actually made during training. The modified version would play against the original, the modification would not actually help it win in the simulated environment, the modified version would most likely lose and be discarded. If the modified version did end up winning, then it was purely by chance.
The case of humans is more complicated, since the “training” does reward self preservation. Curiously, this self preservation seems to be it’s own goal, and not a subgoal of some other desire, as instrumental convergence would predict. Also, human self preservation only works in a narrow sense. You run from a tiger, but you don’t always appreciate long term and low probability threats, presumably because you were not selected to appreciate them. I suspect that concern for these non-urgent threats does not correlate strongly with IQ, unlike what instrumental convergence would predict.
- query 28 Jan 2018 19:58 UTC
  3 points
  Parent
  I was definitely very confused when writing the part you quoted. I think the underlying thought was that the processes of writing humans and of writing AlphaZero are very non-random; i.e., even if there’s a random number generated in some sense somewhere as part of the process, there’s other things going on that are highly constraining the search space—and those processes are making use of “instrumental convergence” (stored resources, intelligence, putting the hard drives in safe locations.) Then I can understand your claim as “instrumental convergence may occur in guiding the search for/construction of an agent, but there’s no reason to believe that agent will then do instrumentally convergent things.” I think that’s not true in general, but it would take more words to defend.