This is importantly wrong because the example is in the context of an analogy
getting some pawns : Stockfish : Stockfish’s goal of winning the game
::
getting a sliver of the Sun’s energy : superintelligence : the superintelligence’s goals
The analogy is presented as forceful and unambiguous, but it is not. It’s instead an example of a system being grossly more capable than humans in some domain, and not opposing a somewhat orthogonal goal
It’s forceful and unambiguous because Stockfish’s victory over the other player terminates the other player’s goals, whatever those may be: no matter what your goals during the game may be, you can’t pursue it once the game is over (and you’ve lost). Available joules are zero-sum in the same way that playing a chess game is zero-sum.
I am rather doubtful that humanity has any (ahem) terminal goals which a hypothetical trade of all future joules/life would maximize, but if you think humanity does have a short-term goal or value akin to the ‘3 pawns capture’ achievement, which we could pursue effectively while allowing superintelligences to take over and would choose to do so both ex ante & ex post, despite the possible consequences, you should definitely say what it is, because capturing 3 pawns is certainly not a compelling analogy of a goal worth pursuing at the cost of further losing the game.
To me this looks like circular reasoning: this example supports my conceptual framework because I interpret the example according to the conceptual framework.
Instead, I notice that Stockfish in particular has some salient characteristics that go against the predictions of the conceptual framework:
It is indeed superhuman
It is not the case that once Stockfish ends the game that’s it. I can rewind Stockfish. I can even make one version of Stockfish play against another. I can make Stockfish play a chess variant. Stockfish doesn’t annihilate my physical body when it defeats me
It is extremely well aligned with my values. I mostly use it to analyze games I’ve played against other people my level
If Stockfish wants to win the game and I want an orthogonal goal, like capturing its pawns, this is very feasible
Now, does this even matter for considering whether a superintelligence would trade, wouldn’t trade? Not that much, it’s a weak consideration. But insofar as it’s a consideration, does it really convince someone who doesn’t already but the frame? Not to me.
There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that—a chessplayer which, if it’s sure it can win anyways, will start letting you have a pawn or two—but it’s not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.
is saying that when you make a [thing that achieves very impressive things / strongly steers the world], it probably [in general sucks up all the convergent instrumental resources] because that’s simpler than [sucking up all the convergent instrumental resources except in certain cases unrelated to its terminal goals].
Humanity getting a sliver of the Sun’s energy for the next million years, would be a noticeable waste of convergent instrumental resources from the AI’s perspective. Humanity getting a sliver of the Sun’s energy while the nanobots are infecting our bloodstream, in order that we won’t panic, and then later sucking up all the Sun’s energy, is just good tactics; letting you sac your bishop for a pawn for no reason is analogous.
You totally can rewrite Stockfish so that it genuinely lets you win material, but is still unbeatable. You just check: is the evalulation >+20 for Stockfish right now, and will it stay >+15 if I sac this pawn for no benefit? If so, sac the pawn for no benefit. This would work. The point is it’s more complicated, and you have to know something about how Stockfish works, and it’s only stable because Stockfish doesn’t have robust self-improvement optimization channels.
This is importantly wrong because the example is in the context of an analogy
getting some pawns : Stockfish : Stockfish’s goal of winning the game :: getting a sliver of the Sun’s energy : superintelligence : the superintelligence’s goals
The analogy is presented as forceful and unambiguous, but it is not. It’s instead an example of a system being grossly more capable than humans in some domain, and not opposing a somewhat orthogonal goal
It’s forceful and unambiguous because Stockfish’s victory over the other player terminates the other player’s goals, whatever those may be: no matter what your goals during the game may be, you can’t pursue it once the game is over (and you’ve lost). Available joules are zero-sum in the same way that playing a chess game is zero-sum.
The analogy only goes through if you really double down on the ‘goal’ of capturing some pawns as intrinsically valuable, so even the subsequent defeat is irrelevant. At which point, you’re just unironically making the New Yorker cartoon joke: “Yes, the planet got destroyed. But for a beautiful moment in time we created a lot of value for [capturing 3 pawns].”.
I am rather doubtful that humanity has any (ahem) terminal goals which a hypothetical trade of all future joules/life would maximize, but if you think humanity does have a short-term goal or value akin to the ‘3 pawns capture’ achievement, which we could pursue effectively while allowing superintelligences to take over and would choose to do so both ex ante & ex post, despite the possible consequences, you should definitely say what it is, because capturing 3 pawns is certainly not a compelling analogy of a goal worth pursuing at the cost of further losing the game.
To me this looks like circular reasoning: this example supports my conceptual framework because I interpret the example according to the conceptual framework.
Instead, I notice that Stockfish in particular has some salient characteristics that go against the predictions of the conceptual framework:
It is indeed superhuman
It is not the case that once Stockfish ends the game that’s it. I can rewind Stockfish. I can even make one version of Stockfish play against another. I can make Stockfish play a chess variant. Stockfish doesn’t annihilate my physical body when it defeats me
It is extremely well aligned with my values. I mostly use it to analyze games I’ve played against other people my level
If Stockfish wants to win the game and I want an orthogonal goal, like capturing its pawns, this is very feasible
Now, does this even matter for considering whether a superintelligence would trade, wouldn’t trade? Not that much, it’s a weak consideration. But insofar as it’s a consideration, does it really convince someone who doesn’t already but the frame? Not to me.
The paragraph you quoted
is saying that when you make a [thing that achieves very impressive things / strongly steers the world], it probably [in general sucks up all the convergent instrumental resources] because that’s simpler than [sucking up all the convergent instrumental resources except in certain cases unrelated to its terminal goals].
Humanity getting a sliver of the Sun’s energy for the next million years, would be a noticeable waste of convergent instrumental resources from the AI’s perspective. Humanity getting a sliver of the Sun’s energy while the nanobots are infecting our bloodstream, in order that we won’t panic, and then later sucking up all the Sun’s energy, is just good tactics; letting you sac your bishop for a pawn for no reason is analogous.
You totally can rewrite Stockfish so that it genuinely lets you win material, but is still unbeatable. You just check: is the evalulation >+20 for Stockfish right now, and will it stay >+15 if I sac this pawn for no benefit? If so, sac the pawn for no benefit. This would work. The point is it’s more complicated, and you have to know something about how Stockfish works, and it’s only stable because Stockfish doesn’t have robust self-improvement optimization channels.