Or do you mean that neural networks would develop an indirect goal as side product of training conditions or via some hidden variable?
This one: I mean the way we train AIs, the things that will emerge are things that pursue goals, at least in some weak sense. So, e.g., suppose you’re training an AI to write valid math proofs via way 2. Probably the best way to do that is to try to gain a bunch of knowledge about math, use your computation efficiently, figure out good ways of reasoning, etc. And the idea would be that as the system gets more advanced, it’s able to pursue these goals more and more effectively, which ends up disempowering humans (because we’re using a bunch of energy that could be devoted to running computations).
This one: I mean the way we train AIs, the things that will emerge are things that pursue goals, at least in some weak sense. So, e.g., suppose you’re training an AI to write valid math proofs via way 2. Probably the best way to do that is to try to gain a bunch of knowledge about math, use your computation efficiently, figure out good ways of reasoning, etc. And the idea would be that as the system gets more advanced, it’s able to pursue these goals more and more effectively, which ends up disempowering humans (because we’re using a bunch of energy that could be devoted to running computations).