Seth Herd comments on [DISC] Are Values Robust?

Seth Herd 17 Nov 2023 23:51 UTC
2 points
0
The reason we’re so concerned with instrumental convergence is that we’re usually thinking of an AGI that can recursively self-improve until it can outmaneuver all of humanity and do whatever it wants. If it’s a lot smarter than us, any benefit we could give it is small compared to the risk that we’ll try to kill it or create more AGIs that will.

The future is hard to predict, that’s why it’s safest to eliminate any hard to predict parts that might actively try to kill you. If you can. If an AGI isn’t that capable, we’re not that concerned. But AGI will have many ways to relatively rapidly improve itself and steadily become more capable.

The usual rebuttal at this point is “just unplug it”. We’d expect an even decently smart machine to pretend to be friendly and aligned until it has some scheme that prevents us from unplugging it.

Your argument for instrumental rationality converging to being nice only applies when you’re on a roughly even playing field, and you can’t just win the game solo if you decide to.