Individual humans do make off much better when they get to select between products from competing companies rather than monopolies, benefitting from companies going out of their way to demonstrate when their products are verifiably better than rivals’. Humans get treated better by sociopathic powerful politicians and parties when those politicians face the threat of election rivals (e.g. no famines). Small states get treated better when multiple superpowers compete for their allegiance. Competitive science with occasional refutations of false claims produces much more truth for science consumers than intellectual monopolies. Multiple sources with secret information are more reliable than one.
It’s just routine for weaker less sophisticated parties to do better in both assessment of choices and realized outcomes when multiple better informed or powerful parties compete for their approval vs just one monopoly/cartel.
Also, a flaw in your analogy is that schemes that use AIs as checks and balances on each other don’t mean more AIs. The choice is not between monster A and monsters A plus B, but between two copies of monster A (or a double-size monster A), and a split of one A and one B, where we hold something of value that we can use to help throw the contest to either A or B (or successors further evolved to win such contests). In the latter case there’s no more total monster capacity, but there’s greater hope of our influence being worthwhile and selecting the more helpful winner (which we can iterate some number of times).
So, the analogy here is that there’s hundreds (or more) of Godzillas all running around, doing whatever it is Godzillas want to do. Humanity helps out whatever Godzillas humanity likes best, which in turn creates an incentive for the Godzillas to make humanity like them.
THIS DOES NOT BODE WELL FOR TOKYO’S REAL ESTATE MARKET.
Still within the analogy: part of the literary point of Godzilla is that humanity’s efforts to fight it are mostly pretty ineffective. In inter-Godzilla fights, humanity is like an annoying fly buzzing around. The humans just aren’t all that strategically relevant. Sure, humanity’s assistance might add some tiny marginal advantage, but from a Godzilla’s standpoint that advantage is unlikely to be enough to balance the tactical/strategic disadvantages of trying not to step on people.
… and that all seems like it should carry over directly to AI, once AI gets to-or-somewhat-past human level, and definitely by the time we get to strongly superhuman intelligence. Even with just human level, the scaling/coordination/learning advantages of being able to cheaply copy a mind are probably enough for the AIs to reasonably-quickly achieve strategic dominance by enough margin that humanity’s preferences are not particularly relevant. (Assuming that the AI isn’t prohibitively expensive to run—but that seems pretty likely to be true under most plausible paths. For instance, if human-level AI is produced by anything like today’s ML, then training costs will dominate and the systems will be relatively cheap to run or fine-tune.)
(There’s also some alignment-specific problems with this scheme which the Godzilla analogy doesn’t highlight. I’m not going into them here because this post is specifically about the Godzilla issues. But I don’t want to give people the impression that this plan would be fine in a world where humanity has sufficient bargaining power; the lack of bargaining power is only one failure mode.)
I was going to make a comment to the effect that humans are already a species of Godzilla (humans aren’t safe, human morality is scary, yada yada), only to find you making the same analogy, but with an optimistic slant. :)
Competition between the powerful can lead to the ability of the less powerful to extract value. It can also lead to the less powerful being more ruthlessly exploited by the powerful as a result of their competition. It depends on the ability to the less powerful to choose between the more powerful. I am not confident humanity or parts of it will have the ability to choose between competing AGIs.
This happens during fine-tuning training already, selecting for weights that give the higher human-rated response of two (or more) options. It’s a starting point that can be lost later on, but we do have it now with respect to configurations of weights giving different observed behaviors.
Individual humans do make off much better when they get to select between products from competing companies rather than monopolies, benefitting from companies going out of their way to demonstrate when their products are verifiably better than rivals’. Humans get treated better by sociopathic powerful politicians and parties when those politicians face the threat of election rivals (e.g. no famines). Small states get treated better when multiple superpowers compete for their allegiance. Competitive science with occasional refutations of false claims produces much more truth for science consumers than intellectual monopolies. Multiple sources with secret information are more reliable than one.
It’s just routine for weaker less sophisticated parties to do better in both assessment of choices and realized outcomes when multiple better informed or powerful parties compete for their approval vs just one monopoly/cartel.
Also, a flaw in your analogy is that schemes that use AIs as checks and balances on each other don’t mean more AIs. The choice is not between monster A and monsters A plus B, but between two copies of monster A (or a double-size monster A), and a split of one A and one B, where we hold something of value that we can use to help throw the contest to either A or B (or successors further evolved to win such contests). In the latter case there’s no more total monster capacity, but there’s greater hope of our influence being worthwhile and selecting the more helpful winner (which we can iterate some number of times).
So, the analogy here is that there’s hundreds (or more) of Godzillas all running around, doing whatever it is Godzillas want to do. Humanity helps out whatever Godzillas humanity likes best, which in turn creates an incentive for the Godzillas to make humanity like them.
THIS DOES NOT BODE WELL FOR TOKYO’S REAL ESTATE MARKET.
Still within the analogy: part of the literary point of Godzilla is that humanity’s efforts to fight it are mostly pretty ineffective. In inter-Godzilla fights, humanity is like an annoying fly buzzing around. The humans just aren’t all that strategically relevant. Sure, humanity’s assistance might add some tiny marginal advantage, but from a Godzilla’s standpoint that advantage is unlikely to be enough to balance the tactical/strategic disadvantages of trying not to step on people.
… and that all seems like it should carry over directly to AI, once AI gets to-or-somewhat-past human level, and definitely by the time we get to strongly superhuman intelligence. Even with just human level, the scaling/coordination/learning advantages of being able to cheaply copy a mind are probably enough for the AIs to reasonably-quickly achieve strategic dominance by enough margin that humanity’s preferences are not particularly relevant. (Assuming that the AI isn’t prohibitively expensive to run—but that seems pretty likely to be true under most plausible paths. For instance, if human-level AI is produced by anything like today’s ML, then training costs will dominate and the systems will be relatively cheap to run or fine-tune.)
(There’s also some alignment-specific problems with this scheme which the Godzilla analogy doesn’t highlight. I’m not going into them here because this post is specifically about the Godzilla issues. But I don’t want to give people the impression that this plan would be fine in a world where humanity has sufficient bargaining power; the lack of bargaining power is only one failure mode.)
I was going to make a comment to the effect that humans are already a species of Godzilla (humans aren’t safe, human morality is scary, yada yada), only to find you making the same analogy, but with an optimistic slant. :)
Competition between the powerful can lead to the ability of the less powerful to extract value. It can also lead to the less powerful being more ruthlessly exploited by the powerful as a result of their competition. It depends on the ability to the less powerful to choose between the more powerful. I am not confident humanity or parts of it will have the ability to choose between competing AGIs.
This happens during fine-tuning training already, selecting for weights that give the higher human-rated response of two (or more) options. It’s a starting point that can be lost later on, but we do have it now with respect to configurations of weights giving different observed behaviors.