It is not at all clear to me that most of the atoms in a planet could be harnessed for technological structures, or that doing so would be energy efficient. Most of the mass of an earthlike planet is iron, oxygen, silicon and magnesium, and while useful things can be made out of these elements, I would strongly worry that other elements that are needed also in those useful things will run out long before the planet has been disassembled. By historical precedent, I would think that an AI civilization on Earth will ultimately be able to use only a tiny fraction of the material in the planet, similarly to how only a very small fraction of a percent of the carbon in the planet is being used by the biosphere, in spite of biological evolution having optimized organisms for billions of years towards using all resources available for life.
The scenario of a swarm of intelligent drones eating up a galaxy and blotting out its stars I think can empirically be dismissed as very unlikely, because it would be visible over intergalactic distances. Unless we are the only civilization in the observable universe in the present epoch, we would see galaxies with dark spots or very strangely altered spectra somewhere. So this isn’t happening anywhere.
There are probably some historical analogs for the scenario of a complete takeover, but they are very far in the past, and have had more complex outcomes than intelligent grey goo scenarios normally portray. One instance I can think of is the Great Oxygenation Event. I imagine an observer back then might have envisioned that the end result of the evolution of cyanobacteria doing oxygenic photosynthesis would be the oceans and lakes and rivers all being filled with green slime, with a toxic oxygen atmosphere killing off all other life. While indeed this prognosis would have been true to a first order approximation—green plants do dominate life on Earth today—the reality of what happened is infinitely more complex than this crude picture suggests. And even anaerobic organisms survive to this day in some niches.
The other historical precedent that comes to mind would be the evolution of organisms that use DNA to encode genetic information using the specific genetic code that is now universal to all life, in whatever pre-DNA world existed at the beginning of life. These seem to have indeed completely erased all other kinds of life (claims of a shadow biosphere of more primitive organisms are all dubious to my knowledge), but also have not resulted in a less complex world.
GoteNoSente
In chess, I think there are a few reasons why handicaps are not more broadly used:
Chess in its modern form is a game of European origin, and it is my impression that European cultures have valued “equal starting conditions for everyone” always higher than “similar chances for everyone to get their desired outcome”. This might have made use of handicaps less appealing, because with handicaps, the game starts from a position that is essentially known to be lost for one side.
There is no good way to combine handicaps in chess with Elo ratings, making it impossible to have rated handicap games. It is also not easy to use handicap results informally to predict optimal handicap between players who haven’t met (if John can give me a knight, and I can give f7 pawn and move to James, it is not at all clear what the appropriate handicap for John against James would be). This is different in Go.
Material handicaps significantly change the flow of the game (the stronger side can try to just trade down into a winning endgame, and for larger handicaps, this becomes easy to execute), and completely invalidate opening theory. This is different in Go and also in more chess-like games such as Shogi, where I understand handicaps are more popular.
Professional players (grandmasters and above) are probably strong enough to convert even a small material handicap like pawn and move fairly reliably into a win against any human (computers, at a few hundred elo points above the best humans, can give probably about pawn to top players and win, at tournament time controls). This implies any handicap system would use only very few handicaps in games between players strong enough that their games are of public interest (Go professionals I understand have probably 3-4 handicap stones between weak and best professional, and maybe two stones vs the best computers). I think that would have been different in the 19th century, when material handicaps in chess were more popular than today.
That said, chess does use handicaps in some settings, but they are not material handicaps. In informal blitz play, time handicaps are sometimes used, often in a format where players start at five minutes for the game and lose a minute if they win, until one of the players arrives at zero minutes. Simultaneous exhibitions and blindfold play are also handicaps that are practiced relatively widely. Judging just by the number of games played in each handicap mode, I’d say though that time handicap is by far the most popular variant at the club player level.
Isn’t the AI box game (or at least its logical core) played out a million times a day between prisoners and correctional staff, with the prisoners losing almost all the time? Real prison escapes (i.e. inmate escape other than did not return from sanctioned time outside) are in my understanding extremely rare.
I think the most important things that are missing in the paper currently are these three points:
1. Comparison to the best Leela Zero networks2. Testing against strong (maybe IM-level) humans at tournament time controls (or a clear claim that we are talking about blitz elo, since a player who does no explicit tree search does not get better if given more thinking time).
3. Games against traditional chess computers in the low GM/strong IM strength bracket would also be nice to have, although maybe not scientifically compelling. I sometimes do those for fun with LC0 and it is utterly fascinating to see how LC0 with current transformer networks at one node per move manages to very often crush this type of opponent by pure positional play, i.e. in a way that makes winning against these machines look extremely simple.
I do not see why any of these things will be devalued in a world with superhuman AI.
At most of the things I do, there are many other humans who are vastly better at doing the same thing than me. For some intellectual activities, there are machines who are vastly better than any human. Neither of these stops humans from enjoying improving their own skills and showing them off to other humans.
For instance, I like to play chess. I consider myself a good player, and yet a grandmaster would beat me 90-95 percent of the time. They, in turn, would lose on average 8.5-1.5 in a ten game match against a world-championship level player. And a world champion will lose almost all of their games against Stockfish running on a smartphone. Stockfish running on a smartphone, in turn, will lose most of its games against Stockfish running on a powerful desktop computer or against Leela Chess Zero running on something that has a decent GPU. I think those opponents would probably, in turn, lose almost all of their games against an adversary that has infinite retries, i.e. that can target and exploit weaknesses perfectly. That is how far I am away from playing chess perfectly.
And yet, the emergence of narrow superintelligence in chess has increased and not diminished my enjoyment of the game. It is nice to be able to play normally against a human, and to then be able to find out the truth about the game by interactively looking at candidate moves and lines that could have been played using Leela. It is nice to see a commented world championship game, try to understand the comments, make up one’s own mind about them, and then explore using an engine why the alternatives that one comes up with (mostly) don’t work.If we get superintelligence, that same accessibility of tutoring at beyond the level of any human expert will be available in all intellectual fields. I think in the winning scenario, this will make people enjoy a wide range of human activities more, not less.
As an additional thought regarding computers, it seems to me that participant B could be replaced by a weak computer in order to provide a consistent experimental setting. For instance, Leela Zero running just the current T2 network (no look-ahead) would provide an opponent that is probably at master-level strength and should easily be able to crush most human opponents who are playing unassisted, but would provide a perfectly reproducible and beatable opponent.
I think having access to computer analysis would allow the advisors (both honest and malicious) to provide analysis far better than their normal level of play, and allow the malicious advisors in particular to set very deep traps. The honest advisor, on the other hand, could use the computer analysis to find convincing refutations of any traps the dishonest advisors are likely to set, so I am not sure whether the task of the malicious side becomes harder or easier in that setup. I don’t think reporting reasoning is much of a problem here, as a centaur (a chess player consulting an engine) can most certainly give reasons for their moves (even though sometimes they won’t understand their own advice and be wrong about why their suggested move is good).
It does make the setup more akin to working with a superintelligence than working with an AGI, though, as the gulf between engine analysis and the analysis that most/all humans can do unassisted is vast.
I could be interested in trying this, in any configuration. Preferred time control would be one move per day. My lichess rating is about 2200.
Are the advisors allowed computer assistance, do the dishonest and the honest advisor know who is who in this experiment, and are the advisors allowed to coordinate? I think those parameters would make a large difference potentially in outcome for this type of experiment.
It is possible to play funny games against it, however, if one uses the fact that it is at heart a story telling, human-intent-predicting system. For instance, this here works (human white):
1. e4 e5 2. Ke2 Ke7 3. Ke3 Ke6 4. Kf3 Kf6 5. Kg3 Kg6 6. Kh3 Kh6 7. Nf3 Nf6 8. d4+ Kg6 9. Nxe5# 1-0
A slight advantage in doing computer security research won’t give an entity the ability to take over the internet, by a long shot, especially if it does not have backing by nation state actors. The NSA for instance, as an organisation, has been good at hacking for a long time, and while certainly they can and have done lots of interesting things, they wouldn’t be able to take over the world, probably even if they tried and did it with the backing of the full force of the US military.
Indeed, for some computer security problems, even superintelligence might not confer any advantage at all! It’s perfectly possible, say, that a superintelligence running on a Matrioshka brain a million years hence will find only modest improvements upon current best attacks against the full AES-128. Intelligence allows one to do math better and, occasionally, to find ways and means that side-step mathematical guarantees, but it does not render the adversary omnipotent; an ASI still has to accept (or negotiate around) physical, mathematical and organizational limits to what it can do. In that sense, a lot of the ASI safety debate I think runs on overpowered adversaries, which will in the long run be bad both in terms of achieving ASI safety (because in an overpowered adversary model, real dangers risk remaining unidentified and unfixed) and in terms of realizing the potential benefits of creating AGI/ASI.
To second a previous reply to this, I would expect this will hold for humans as well.
On top of that, mathematically it is perfectly possible for some function to be easy to learn/compute, but the inverse to be hard. For instance, discrete exponentiation is easy to compute in all groups where multiplication is easy to compute, but the inverse function, the discrete logarithm, is hard enough to base cryptography on it, if one picks a suitable group representation (e.g. point groups of secure elliptic curves, or the group of invertible elements of a large safe prime field).
Similar examples exist with regards to function learnability for neural networks as well. A simple example of a function that is easy to learn for a neural network but which has a much more difficult to learn inverse is f(x1,x2,x3,...,xn) = (x1 xor x2, x2 xor x3, …, x_{n-1} xor x_{n} (for difficulty of learning this, one would assume learning from random samples, and with common multi-label loss functions; with suitable tricks, this does become learnable if the neural network can represent the inverse target function).
A final point that I would consider here is that it is possible that for the reverse questions in this task, a privacy protection mechanism kicks in that makes the LLM deny knowledge of the non-celebrity. It seems perfectly possible to me that GPT-4 is lying when it says it doesn’t know about <mother of celebrity>, because it has been instructed to lie about these things in order to protect the privacy of people not considered to be in the public eye.
The playing strength of parrotchess seems very uneven, though. On the one hand, if I play it head-on, just trying to play the best chess I can, I would estimate it even higher than 1800, maybe around 2000 when we regard this as blitz. I’m probably roughly somewhere in the 1900s and on a few tries, playing at blitz speed myself, I would say I lost more than I won overall.
On the other hand, trying to play an unconventional but solid opening in order to neutralize its mostly awesome openings and looking out for tactics a bit while keeping the position mostly closed, I got this game, where it does not look at all stronger than the chat3.5 models, and therefore certainly not 1800-level:
https://lichess.org/study/ymmMxzbj/SpMFmwXH
Nonetheless, the performance of this model at chess is very interesting. None of the other models, including GPT-4, has (with prompting broadly similar to what parrotchess uses) been able to get a good score against me if I just played it as I would play most human opponents, so in that sense it definitively seems impressive to me, as far as chess-playing language models go.
If high-tech aliens did visit us, it would not seem inconceivable that the drones they would send might contain (or are able to produce prior to landing) robotic exploration units based on some form of nanotechnology that we might mistake for biology and more specifically, for pilots. A very advanced robot need not look like a robot.
I also do not find it too worrisome that we do not see Dyson spheres or a universe converted into computronium. It is possible that the engineering obstacles towards either goal are more formidable than the back-of-the-envelope assessments that originated these concepts suggest and that even the grabbiest of aliens do not execute such programs. Maybe even very advanced civilizations convert only a small part of their local system’s mass into civilized matter, just like our biosphere has only converted a small part of Earth, despite billions of years of trying to reproduce as much as possible. These are things where people probably overestimate the amount of information we can wring out of the Fermi paradox.
However, a sizable number of recovered craft would suggest that there is a population of craft in the solar system suffering from some rate of attrition. If so, where would they be coming from? A steady supply line maintained over at least several light years? Or a factory somewhere in the system?
I’ll be intrigued if evidence at least the verifiability of the Snowden files comes along, not before.
A hardware protection mechanism that needs to confirm permission to run by periodically dialing home would, even if restricted to large GPU installations, brick any large scientific computing system or NN deployment that needs to be air-gapped (e.g. because it deals with sensitive personal data, or particularly sensitive commercial secrets, or with classified data). Such regulation also provides whoever controls the green light a kill switch against any large GPU application that runs critical infrastructure. Both points would severely damage national security interests.
On the other hand, the doom scenarios this is supposed to protect from would, at least as of the time of writing this, by most cybersecurity professionals probably be viewed as an example of poor threat modelling (in this case, assuming the adversary is essentially almighty and that everything they do will succeed on their first try, whereas anything we try will fail because it is our first try).In summary, I don’t think this would (or should) fly, but obviously I might be wrong. For a point of reference, techniques similar in spirit have been seriously proposed to regulate use of cryptography (for instance, via adoption of the Clipper chip), but I think it’s fair to say they have not been very successful.
Thanks for the information. I’ll try out BT2. Against LazyBot I was just then able to get a draw in a blitz game with 3 seconds increment, which I don’t think I could do within a few tries against an opponent of, say, low grandmaster strength (with low grandmaster strength being quite far way away from superhuman still). Since pure policy does not improve with thinking time, I think my chances would be much better at longer time controls. Certainly its lichess rating at slow time controls suggests that T80 is not more than master strength when its human opponents have more than 15 minutes for the whole game.
Self-play elo vastly exaggerates playing strength differences between different networks, so I would not expect a BT2 vs T80 difference of 100 elo points to translate to close to 100 elo playing strength difference against humans.
The LC0 pure policy is most certainly not superhuman. To test this, I just had it (network 791556, i.e. standard network of the current LC0 release) play a game against a weak computer opponent (Shredder Chess Online). SCO plays maybe at the level of a strong expert/weak candidate master at rapid chess time controls (but it plays a lot faster, thereby making generation of a test game more convenient than trying to beat policy-only lc0 myself, which I think should be doable). Result was draw, after lc0 first completely outplayed SCO positionally, and then blundered tactically in a completely won position, with a strong-looking move that had a simple tactical refutation. It then probably still had a very slight advantage, but opted to take a draw by repetition.
I think policy-only lc0 plays worse relative to strong humans than Katago/LeelaZero in Go. I would attribute this to chess being easier to lose by tactical blunder than Go.
I would disagree with the notion that the cost of mastering a world scales with the cost of the world model. For instance, the learning with errors problem has a completely straightforward mathematical description, and yet strong quantum-resistant public-key cryptosystems can be built on it; there is every possibility that even a superintelligence a million years from now will be unable to read a message encrypted today using AES-256 encapsulated using a known Kyber public key with conservatively chosen security parameters.
Similarly, it is not clear to me at all what is even meant by saying that a tiny neural network can perfectly predict the “world” of Go. I would expect that even predicting the mere mechanics of the game, for instance determining that a group has just been captured by the last move of the opponent, will be difficult for small neural networks when examples are adversarially chosen (think of a group that snakes around the whole board, overwhelming the small NN capability to count liberties). The complexity of determining consequences of actions in Go is much more dependent on the depth of the required search than on the size of the game state, and it is easy to find examples on the 19x19 standard board size that will overwhelm any feed-forward neural network of reasonable size (but not necessarily networks augmented with tree search).
With regards to FOOM, I agree that doom from foom seems like an unlikely prospect (mainly due to diminishing returns on the utility of intelligence in many competitive settings) and I would agree that FOOM would require some experimental loop to be closed, which will push out time scales. I would also agree that the example of Go does not show what Yudkowsky thinks it does (it does help that this is a small world where it is feasible to do large reinforcement learning runs, and even then, Go programs have mostly confirmed human strategy, not totally upended it). But the possibility that if an unaided large NN achieved AGI or weak ASI, it would then be able to bootstrap itself to a much stronger level of ASI in a relatively short time (similar to the development cycle timeframe that led to the AGI/weak ASI itself; but involving extensive experimentation, so neither undetectable nor done in minutes or days) by combining improved algorithmic scaffolding with a smaller/faster policy network does not seem outlandish to me.
Lastly, I would argue that foom is in fact an observable phenomenon today. We see self-reinforcing, rapid, sudden onset improvement every time a neural network during training discovers a substantially new capability and then improves on it before settling into a new plateau. This is known as grokking and well-described in the literature on neural networks; there are even simple synthetic problems that produce a nice repeated pattern of grokking at successive levels of performance when a neural network is trained to solve them. I would expect that fooming can occur at various scales. However, I find the case that a large grokking step automatically happens when a system approaches human-level competence on general problem unconvincing (on the other hand, of course a large grokking step could happen in a system already at human-level competence by chance or happenstance and push into the weak ASI regime in a short time frame).
This seems clearly wrong:
Go is extremely simple: the entire world of Go can be precisely predicted by trivial tiny low depth circuits/programs. This means that the Go predictive capability of a NN model as a function of NN size completely flatlines at an extremely small size. A massive NN like the brain’s cortex is mostly wasted for Go, with zero advantage vs the tiny NN AlphaZero uses for predicting the tiny simple world of Go.
Top go-playing programs utilize neural networks, but they are not neural networks. Monte-Carlo Tree Search boosts their playing strength immensely. The underlying pure policy networks would be strong amateur level when playing against opponents who are unaware that they are playing a pure neural network, but they would lose quite literally every game against top humans. It seems very likely that a purely NN-based player without search would have to be based on a far more complex neural network than the ones we see in, say, Leela Zero or Katago. In addition, top programs like Katago use some handcrafted features (things about the current game state that can be efficiently computed by traditional hand-written code, but would be difficult to learn or compute for a neural network), so they deviate to a significant extent from the paradigm of pure reinforcement learning via self-play from just the rules that AlphaZero proved viable. This, too, significantly improves their playing strength.
Finally, Go has a very narrow (or, with half-integer komi and rulesets that prevent long cycles, non-existent) path to draw, and games last for about 250 moves. That means that even small differences in skill can be reliably converted to wins. I would guess that the skill ceiling for Go (and thereby, the advantage that a superintelligence would have in Go over humans or current go-playing machines) is higher than in most real-life problems. Go is as complicated as the opponent makes it. I would, for these reasons, in fact not be too surprised if the best physically realizable go-playing system at tournament time controls with hardware resources, say, equivalent to a modern-day data center would include a general intelligence (that would likely adjust parameters or code in a more specialized go-player on the fly, when the need arises).
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when thinking about takeoff.
The best chess-playing machines have been fairly strong (by human standards) since the late 1970s (Chess 4.7 showed expert-level tournament performance in 1978, and Belle, a special-purpose chess machine, was considered a good bit stronger than it). By the early 90s, chess computers at expert level were available to consumers at a modest budget, and the best machine built (Deep Thought) was grandmaster-level. It then took another six years for the Deep Thought approach to be scaled up and tuned to reach world-champion level. These programs were based on manually designed evaluation heuristics, with some automatic parameter tuning, and alpha-beta search with some manually designed depth extension heuristics. Over the years, people designed better and better evaluation functions and invented various tricks to reduce the amount of work spent on unpromising branches of the game tree.
Long into the 1990s, many strong players were convinced that this approach would not scale to world championship levels, because they believed that play competitive at the world champion level required correctly dealing with various difficult strategic problems, and that working within the prevailing paradigm would only lead to engines that were even more superhuman at tactics than had been already observed, while still failing against the strongest players due to lack of strategic foresight. This proved to be wrong: classical chess programs reached massively superhuman strength on the traditional approach to chess programming, and this line of programs was completely dominant and still improving up to about the year 2019.In 2019, a team at DeepMind showed that throwing reinforcement learning and Monte Carlo Tree Search at chess (and various other games) could produce a system playing at an even higher level than the then-current version of Stockfish running on very strong hardware. Today, the best engines use either this approach or the traditional approach to chess programming augmented by incorporation of a very lightweight neural network for accurate positional evaluation.
For Go, there was hardly any significant progress from about the early 90s to the early 2010s: programs were roughly at the level of a casual player who had studied the game for a few months. A conceptual breakthrough (the invention of Monte-Carlo Tree Search) then brought them to a level equivalent in chess maybe to a master by the mid-2010s. DeepMind’s AlphaGo system then showed in 2016 that reinforcement learning and MCTS could produce a system performing at a superhuman level when run on a very powerful computer. Today, programs based on the same principles (with some relatively minor go-specific improvements) run at substantially higher playing strength than AlphaGo on consumer hardware. The vast majority of strong players was completely convinced in 2016 that AlphaGo would not win its match against Lee Sedol (a world-class human player).
Chess programs had been superhuman at the things they were good at (spotting short tactics) for a long time before surpassing humans in general playing strength, arguably because their weaknesses improved less quickly than their strengths. Their weaknesses are in fact still in evidence today: it is not difficult to construct positions that the latest versions of LC0 or Stockfish don’t handle correctly, but it is very difficult indeed to exploit this in real games. For Go programs, similar remaining weak spots have recently been shown to be exploitable in real games (see https://goattack.far.ai/), although my understanding is that these weaknesses have now largely been patched.
I think the general lesson that AI performance at a task will be determined by the aspects of that task that the AI handles best when the AI is far below human levels and by the aspects of the task that the AI handles worst when it is at or above human level, and that this slows down perceived improvement relative to humans once the AI is massively better than humans at some task-relevant capabilities, does in my expectation carry over to some extent from narrow AI (like chess computers) to general AI (like language models). In terms of the transition from chimpanzee-level intelligence to Einstein, this means that the argument from the relatively short time span evolution took to bridge that gap is probably not as general as it might look at first sight, as chimpanzees and humans probably share similar architecture-induced cognitive gaps, whereas the bottlenecks of an AI could be very different.This would suggest (maybe counterintuitively) that fast takeoff scenarios are more likely with cognitive architectures that are similar to humans than with very alien ones.
A world with no human musicians won’t happen, unless there is some extinction-level event that at a minimum leads to a new dark age. AI music will not outcompete human music (at least not to the point where the latter is not practised professionally any more), because a large part of the appeal of music is the knowledge that another human made it.
We have a similar situation today in chess. Of course a cellphone can generate chess games that are of higher quality (less errors, awesome positional and tactical play) than those of human world-class players. If one generates a sufficient number of such self-play games, some will even be beautiful and contain interesting new chess ideas. Still, nobody is interested in self-play games from my cellphone, precisely because anyone can make more of the same at almost no cost. The games of Magnus Carlsen, on the other hand, are followed and analysed and scrutinised by many, precisely because there is a struggle of human wits in each of these games and they are not abundantly available; they are masterpieces of human chess, and better (not worse) for the flaws we can easily discover in them with engine help.