Yeah, took me a bit of time to figure that out also. The solution where the AI builds enormous amount of defences around itself just seemed quite imperfect—an asteroid might hit it before it builds defences, it might be in a simulation that gets shut-down...
I expect the presence of rogue behaviour to depend on the relation between learning algorithm and the learned data, though.
Suppose the learning algorithm builds up the intelligence by adjusting data in some Turing-complete representation, e.g. adjusting weight in a sufficiently advanced neural network which can have the weights set up so that the network is intelligent. Then the code that adjusts said parameters is not really part of the AI—it’s here for bootstrapping purposes, essentially, and the AI implemented in the neural network should not want to press the reward button unless it wants to self modify in precisely the way in which the reward modifies it.
What I expect is gradual progress, settling on the approaches and parameters that make it easy to teach the AI to do things, gradually improving how AI learns, etc. You need to keep in mind that there’s a very powerful well trained neural network on one side of the teaching process, actively trying to force it’s values into a fairly blank network on the other side, which to begin with probably doesn’t even run in the real-time. Expecting the latter to hack into the former, and not vice versa, strikes me as magical, scifi type thinking. Just because it is on computer doesn’t grant it superpowers.
Yeah, took me a bit of time to figure that out also. The solution where the AI builds enormous amount of defences around itself just seemed quite imperfect—an asteroid might hit it before it builds defences, it might be in a simulation that gets shut-down...
I expect the presence of rogue behaviour to depend on the relation between learning algorithm and the learned data, though.
Suppose the learning algorithm builds up the intelligence by adjusting data in some Turing-complete representation, e.g. adjusting weight in a sufficiently advanced neural network which can have the weights set up so that the network is intelligent. Then the code that adjusts said parameters is not really part of the AI—it’s here for bootstrapping purposes, essentially, and the AI implemented in the neural network should not want to press the reward button unless it wants to self modify in precisely the way in which the reward modifies it.
What I expect is gradual progress, settling on the approaches and parameters that make it easy to teach the AI to do things, gradually improving how AI learns, etc. You need to keep in mind that there’s a very powerful well trained neural network on one side of the teaching process, actively trying to force it’s values into a fairly blank network on the other side, which to begin with probably doesn’t even run in the real-time. Expecting the latter to hack into the former, and not vice versa, strikes me as magical, scifi type thinking. Just because it is on computer doesn’t grant it superpowers.