I’m having a very hard time modelling how different AI types would act in extreme scenarios like that. I’m surprised there isn’t more written about this, because it seems extremely important to whether UFAI is even a threat at all. I would be very relieved if that was the case, but it doesn’t seem obvious to me.
Particularly I worry about AIs that predict future reward directly, and then just take the local action that predicts the highest future reward. Like is typically done in reinforcement learning. An example would be Deepmind’s Atari playing AI which got a lot of press.
I don’t think AIs with entire world models that use general planning algorithms would scale to real world problems.Too much irrelevant information to model, too large a search space to search.
As they train their internal model to predict what their reward will be in x time steps, and as x goes to infinity, they care more and more about self preservation. Even if they have already hijacked the reward signal completely.
I’m having a very hard time modelling how different AI types would act in extreme scenarios like that. I’m surprised there isn’t more written about this, because it seems extremely important to whether UFAI is even a threat at all. I would be very relieved if that was the case, but it doesn’t seem obvious to me.
Particularly I worry about AIs that predict future reward directly, and then just take the local action that predicts the highest future reward. Like is typically done in reinforcement learning. An example would be Deepmind’s Atari playing AI which got a lot of press.
I don’t think AIs with entire world models that use general planning algorithms would scale to real world problems.Too much irrelevant information to model, too large a search space to search.
As they train their internal model to predict what their reward will be in x time steps, and as x goes to infinity, they care more and more about self preservation. Even if they have already hijacked the reward signal completely.