Rob Bensinger comments on AGI Ruin: A List of Lethalities

Rob Bensinger 9 Jun 2022 0:59 UTC
6 points
3
AlphaGo doesn’t care about being unplugged in the middle of a game (unless that dynamic was part of its training data). It cares about the platonic game of go, not about the instantiated game it’s currently playing.
What if the programmers intervene mid-game to give the other side an advantage? Does a Go AGI, as you’re thinking of it, care about that?
I’m not following why a Go AGI (with the ability to think about the physical world, but a utility function that only cares about states of the simulation) wouldn’t want to seize more hardware, so that it can think better and thereby win more often in the simulation; or gain control of its hardware and directly edit the simulation so that it wins as many games as possible as quickly as possible.
Why would having a utility function that only assigns utility based on X make you indifferent to non-X things that causally affect X? If I only terminally cared about things that happened a year from now, I would still try to shape the intervening time because doing so will change what happens a year from now.
(This is maybe less clear in the case of shutdown, because it’s not clear how an agent should think about shutdown if its utility is defined states of its simulation. So I’ll set that particular case aside.)
- David Johnston 9 Jun 2022 1:03 UTC
  4 points
  2
  Parent
  A Go AI that learns to play go via reinforcement learning might not “have a utility function that only cares about winning Go”. Using standard utility theory, you could observe its actions and try to rationalise them as if they were maximising some utility function, and the utility function you come up with probably wouldn’t be “win every game of Go you start playing” (what you actually come up with will depend, presumably, on algorithmic and training regime details). The reason why the utility function is slippery is that it’s fundamentally an adaptation executor, not a utility maxmiser.