This dialog was much less painful for me to read than i expected, and I think it manages to capture at least a little of the version-of-this-concept that I possess and struggle to articulate!
(...that sentence is shorter, and more obviously praise, in my native tongue.)
A few things I’d add (epistemic status: some simplification in attempt to get a gist across):
If there were a list of all the possible plans that cure cancer, ranked by “likely to work”, most of the plans that might work route through “consequentalism”, and “acquire resources.”
Part of what’s going on here is that reality is large and chaotic. When you’re dealing with a large and chaotic reality, you don’t get to generate a full plan in advance, because the full plan is too big. Like, imagine a reasoner doing biological experimentation. If you try to “unroll” that reasoner into an advance plan that does not itself contain the reasoner, then you find yourself building this enormous decision-tree, like “if the experiments come up this way, then I’ll follow it up with this experiment, and if instead it comes up that way, then I’ll follow it up with that experiment”, and etc. This decision tree quickly explodes in size. And even if we didn’t have a memory problem, we’d have a time problem—the thing to do in response to surprising experimental evidence is often “conceptually digest the results” and “reorganize my ontology accordingly”. If you’re trying to unroll that reasoner into a decision-tree that you can write down in advance, you’ve got to do the work of digesting not only the real results, but the hypothetical alternative results, and figure out the corresponding alternative physics and alternative ontologies in those branches. This is infeasible, to say the least.
Reasoners are a way of compressing plans, so that you can say “do some science and digest the actual results”, instead of actually calculating in advance how you’d digest all the possible observations. (Note that the reasoner specification comprises instructions for digesting a wide variety of observations, but in practice it mostly only digests the actual observations.)
Like, you can’t make an “oracle chess AI” that tells you at the beginning of the game what moves to play, because even chess is too chaotic for that game tree to be feasibly representable. You’ve gotta keep running your chess AI on each new observation, to have any hope of getting the fragment of the game tree that you consider down to a managable size.
Like, the outputs you can get out of an oracle AI are “no plan found”, “memory and time exhausted”, “here’s a plan that involves running a reasoner in real-time” or “feed me observations in real-time and ask me only to generate a local and by-default-inscrutable action”. In the first two cases, your oracle is about as useful as a rock; in the third, it’s the realtime reasoner that you need to align; in the fourth, all whe word “oracle” is doing is mollifying you unduly, and it’s this “oracle” that you need to align.
(NB: It’s not obvious to me that cancer cures require routing through enough reality-chaos that plans fully named in advance need to route through reasoners; eg it’s plausible that you can cure cancer with a stupid amount of high-speed trial and error. I know of no pivotal act, though, that looks so easy to me that nonrealtime-plans can avoid the above quadlemma.)
And it’s not obvious to me whether this problem gets better or worse if you’ve tried to train the oracle to only output “reasonable seeming plans”
My point above addresses this somewhat, but I’m going to tack another point on for good measure. Suppose you build an oracle and take the “the plan involves a realtime reasoner” fork of the above quadlemma. How does that plan look? Does the oracle say “build the reasoner using this simple and cleanly-factored mind architecture, which is clearly optimizing for thus-and-such objectives?” If that’s so easy, why aren’t we building our minds that way? How did it solve these alignment challenges that we find so difficult, and why do you believe it solved them correctly? Also, AIs that understand clean mind-architectures seem deeper in the tech tree than AIs that can do some crazy stuff; why didn’t the world end five years before reaching this hypothetical?
Like, specifying a working mind is hard. (Effable, transparent, and cleanly-factored minds are hander still, apparently.) You probably aren’t going to get your first sufficiently-good-reasoner from “project oracle” that’s training a non-interactive system to generate plans so hard that it invents its own mind architectures and describes their deployment, you’re going to get it from some much more active system that is itself a capable mind before it knows how to design a capable mind, like (implausible detail for the purpose of concrete visualization) the “lifelong learner” that’s been chewing through loads and loads of toy environments while it slowly acretes the deep structures of cognition.
Maybe once you have that, you can go to your oracle and be like “ok, you’re now allowed to propose plans that involve deploying this here lifelong learner”, but of course your lifelong learner doesn’t have to be a particularly alignable architecture; its goals don’t have to be easily identifiable and cleanly separable from the rest of its mind.
Which is mostly just providing more implausible detail that makes the “if your oracle emits plans that involve reasoners, then it’s the reasoners you need to align” point more concrete. But… well, I’m also trying to gesture at why the “what if we train the oracle to only output reasonable plans?” thought seems, to me, to come at it from a wrong angle, in a manner that I still haven’t managed to precisely articulate.
(I’m also hoping this conveys at least a little more of why the “just build an oracle that does alignment research” looks harder than doing the alignment research our own damn selves, and I’m frustrated by how people give me a pitying look when I suggest that humanity should be looking for more alignable paradigms, and then turn around and suggest that oracles can do that no-problem. But I digress.)
Also, AIs that understand clean mind-architectures seem deeper in the tech tree than AIs that can do some crazy stuff; why didn’t the world end five years before reaching this hypothetical?
Possible world: Alignment is too hard for a small group of people to cleanly understand, but not too far beyond that. In part because the profitability/researcher status gradient doesn’t push AI research towards alignment, building an AI which is cleanly designed and aligned is a natural solution found by a mid-level messy AI, even though that mid-level messy AI is still too dumb to help mainstream researchers gain a ton of power by the tasks they try it on. Because gaining power is hard due to adversarial pressures.
(After I’ve written that, I believe what I’ve written less, one because it involves a few independent details, but two because I don’t see why the mainstream researchers wouldn’t have elicited that capability but alignment researchers did.
I have an intuition that I didn’t fully express with the above, though, and so I’m not totally backing off of my hunch that there’s some gap in your argument which I quoted.)
Like, you can’t make an “oracle chess AI” that tells you at the beginning of the game what moves to play, because even chess is too chaotic for that game tree to be feasibly representable. You’ve gotta keep running your chess AI on each new observation, to have any hope of getting the fragment of the game tree that you consider down to a managable size.
It’s not obvious to me how generally true this is. You can’t literally specify every move at the beginning of the game, but it seems like there could be instructions that work for more specified chess tasks. Like, I imagine a talented human chess coach could generate a set of instructions in English that would work well for defeating me at chess at least once (maybe there already exist “how to beat noobs at chess” instructions that will work for this). I would be unsurprised if there exists a set of human-readable instructions of human-readable length that would give me better-than-even odds of defeating a pre-specified chess expert at least once, that can be generated just by noticing and exploiting as-yet-unnoticed regularities in either that expert’s play in particular or human-expert-level chess in general.
It’s possible my intuition here is related to my complete lack of expertise in chess, and I would not be surprised if Magnus-Carlsen-defeating instructions do not exist (at least, not without routing through a reasoner). Still, I think I assign greater credence to shallow-pattern-finding AI enabling a pivotal act than you do, and I’m wondering if the chess example is probing this difference in intuition.
As a causal chess player it seems unlikely to me that there are any such instructions that would lead a beginner to beat even a halfway decent player. Chess is very dependent on calculation (mentally stepping through the game tree) and evaluation (recognising if a position is good or bad). Given the slow clock speed of the human brain (compared to computers), our calculations are slow and so we must lean heavily on a good learned evaluation function, which probably can’t be explicitly represented in a way that would be fast enough to execute manually. In other words you’d end up taking hours to make a move or something.
There’s no shortcut like “just move these pawns 3 times in a mysterious pattern, they’ll never expect it”—“computer lines” that bamboozle humans require deep search that you won’t be able to do in realtime.
Edit: the Oracle’s best chance against an ok player would probably be to give you a list of trick openings that lead to “surprise” checkmate and hope that the opponent falls into one, but it’s a low percentage.
I’m not sure that this is true. (Depends a lot on what rating do you define as “Halfway decent”). There are, in fact, rules that generalize over lots of board states, such as
capture toward the center
don’t advance the pawns around your king
early on, focus on getting knight/bishop to squares from which they have many moves
etc.
If I had one day to make such a list, I don’t think a beginner could use it to beat a 1200 player in, say, a 30 minute game. But I’m very uncertain about the upper limit of usefulness of such a list. I wonder about stuff like that a lot, but it’s very hard to tell. (Have you read a book about chess principles?)
I’m not even confident that you couldn’t beat Magnus. It depends on a bunch of factors, but perhaps you could just choose a line that seems forcing for black and try to specify enough branches of the tree to give you > 50% chance that it covers the game with Magnus. You could call this cheating, but it’s unclear how to formalize the challenge to avoid it. If Magnus knows who he’s playing against, this would make it significantly harder.
I’m very confident that Magnus absolutely crushes a beginner who has been given a personal chess book, of normal book length, written by God. Magnus still has all the advantages.
Magnus can evaluate moves faster and has a deeper search tree.
The book of chess can provide optimal opening lines, but the beginner needs to memorize them, and Magnus has a greater capacity for memorizing openings.
The book of chess can provide optimal principles for evaluating moves, but the beginner has to apply them, and decide what to do when they point in different directions. This comes from practice. A book of normal size can only provide limited practice examples.
The beginner will have a higher rate of blunders. It is hard to “capture toward the center” when you don’t even see the capture.
Some intuitions from chess books: the book God would give to a beginner is different to the book God would give a 1200 player. After reading a chess book, it is normal for ability to initially go down, until the advice has been distilled and integrated with practice. Reading a chess book helps
improve faster, not to be instantly better.
Some intuitions from chess programs: they lose a lot of power if you cut down their search time to simulate the ability of a beginner to calculate variations, and also cut down their opening database to simulate the ability of a beginner to memorize openings, and also give them a random error chance to simulate a beginner’s rate of blunders.
Sorry for the double response, but a separate point here is that your method of estimating the effectiveness of the best possible book seems dubious to me. It seems to be “let’s take the best book we have; the perfect book won’t be that much better”. But why would this be true, at all? We have applied tons of optimization pressure to chess and probably know that the ceiling isn’t that far above Stockfish, but we haven’t applied tons of optimization pressure to distilling chess. How do you know that the best possible book won’t be superior by some large factor? Why can’t the principles be so simple that applying them is easy? (This is a more general question; how can you e.g. estimate the effectiveness of the best possible text book for some subfield of math?)
I’m a bit more sympathetic to this if we play Blitz, but for the most interesting argument, I think we should assume classical time format, where any beginner can see all possible captures.
Thanks for the double response. This line seems potentially important. If we could safely create an Oracle that can create a book of chess that massively boosts chess ability, then we could maybe possibly miraculously do the same thing to create a book that massively boosts AI safety research ability.
I agree that my argument above was pretty sketchy, just “intuitions” really. Here’s something a bit more solid, after further reflection.
I’m aware of adversarial examples and security vulnerabilities, so I’m not surprised if a superintelligence is able to severely degrade human performance via carefully selected input. A chess book that can make Magnus lose to a beginner wouldn’t surprise me. Neither would a chess book that degraded a beginner’s priorities such that they obsessed about chess, for however many Elo points that would be worth.
But mostly this problem is in the opposite direction: can we provide carefully curated input that allows an intelligence to learn much faster? In this direction the results seem much less dramatic. My impression is that the speed of learning is limited by both the inputs and the learner. If the book of chess is a perfect input, then the limiting factor is the reader, and an average reader won’t get outsized benefits from perfect inputs.
Possible counter-argument: supervised learning can outperform unsupervised learning by some large factor, data quality can likewise have a big impact. That’s fine, but every chess book I’ve read has been supervised learning, and chess books are already higher data quality than scraping r/chess. So those optimizations have already been made.
Possible counter-argument: few-shot learning in GPT-3? This seems more like surface knowledge that is already in the language model. So maybe a chess beginner already has the perfect chess algorithm somewhere in their brain, and the chess book just needs to surface that model and suppress all the flawed models that are competing with it? I don’t buy it, that’s not what it feels like learning chess from the inside, but maybe I need to give the idea some weight.
Possible counter-argument: maybe humans are actually really intelligent and really good learners and the reason we’re so flawed is that we have bad inputs? Eg from other flawed humans, random chance hiding things, biases in what we pay attention to, etc. I don’t buy this, but I don’t actually have a clear reason why.
But mostly this problem is in the opposite direction: can we provide carefully curated input that allows an intelligence to learn much faster? In this direction the results seem much less dramatic. My impression is that the speed of learning is limited by both the inputs and the learner. If the book of chess is a perfect input, then the limiting factor is the reader, and an average reader won’t get outsized benefits from perfect inputs.
Which results did you have in mind? The ‘machine teaching’ results are pretty dramatic and surprising, although one could question whether they have any practical implications.
I wasn’t aware of them. Thanks. Yes, that’s exactly the sort of thing I’d expect to see if there was a large possible upside in better teaching materials that an Oracle could produce. So I no longer disagree with Rafael & Richard on this.
But mostly this problem is in the opposite direction: can we provide carefully curated input that allows an intelligence to learn much faster? In this direction the results seem much less dramatic. My impression is that the speed of learning is limited by both the inputs and the learner. If the book of chess is a perfect input, then the limiting factor is the reader, and an average reader won’t get outsized benefits from perfect inputs.
My problem with this is that you’re treating the amount of material as fixed and abstracting it as “speed”; however, what makes me unsure about the power of the best possible book is that it may choose a completely different approach.
E.g., consider the “ontology” of high-level chess principles. We think in terms of “development” and “centralization [of pieces]” and “activity” and “pressure” and “attacking” and “discoveries” and so forth. Presumably, most of these are quite helpful; if you have no concept of discoveries, you will routinely place your queen or king on inconvenient squares and get punished. If you have no concept of pressure, you have no elegant way of pre-emptive reaction if your opponent starts aligning a lot of pieces toward your king, et cetera.
So, at the upper end of my probability distribution for how good a book would be, it may introduce a hundred more such concepts, each one highly useful to elegantly compress various states. It will explain them all in the maximally intuitive and illustrative way, such that they all effortlessly stick, in the same way that sometimes things you hear just make sense and fit your aesthetic, and you recall them effortlessly. After reading this book, a beginner will look at a bunch of moves of a 2000 elo player, and go “ah, these two moves clearly violate principle Y”. Even though this player has far less ability to calculate lines, they know so many elegant compressions that they may compensate in a direct match. Much in the same way that you may beat someone who has practiced twice as long as you but has no concept of pressure; they just can’t figure out how to spot situations from afar where their king is suddenly in trouble.
Isn’t it trivial for the beginner to beat Magnus using this book? God just needs to predict Magnus perfectly, and write down a single list of moves that the beginner needs to follow to beat him. Half a page is enough.
In general, you ignored this approach, which is the main reason why I’m unsure whether a book from a superintelligence could beat Magnus.
I read your idea of “a line that seems forcing for black”, and I interpreted it as being forcing for black in general, and responded in terms of memorizing optimal opening lines. It sounds like you meant a line that would cause Magnus in particular to respond in predictable ways? Sorry for missing that.
I can imagine a scenario with an uploaded beginner and an uploaded Magnus in a sealed virtual environment running on error-correcting hardware with a known initial state and a deterministic algorithm, and your argument goes through there, and in sufficiently similar scenarios.
Whereas I had in mind a much more chaotic scenario. For example, I expect Magnus’s moves to depend in part on the previous games he played, so predicting Magnus requires predicting all of those games, and thus the exponential tree of previous games. And I expect his moves to depend in part on his mood, eg how happy he’d be with a draw. So our disagreement could be mostly about the details of the hypothetical, such as how much time passes between creating the book and playing the game?
I read your idea of “a line that seems forcing for black”, and I interpreted it as being forcing for black in general
So to clarify: this interpretation was correct. I was assuming that a superintelligence cannot perfectly predict Magnus, pretty much for the reasons you mention (dependency on previous games, mood, etc.) But I then changed that standard when you said
I’m very confident that Magnus absolutely crushes a beginner who has been given a personal chess book, of normal book length, written by God.
Unlike a superintelligence, surely god could simulate Magnus perfectly no matter what; this is why I called the problem trivial—if you invoke god.
If you don’t invoke god (and thus can’t simulate magnus), I remain unsure. There are already games where world champions play the top move recommended by the engine 10 times in a row, and those have not been optimized for forcing lines. You may overestimate how much uncertainty or variance there really is. (Though again, if Magnus knows what you’re doing, it gets much harder since then he could just play a few deliberately bad moves and get you out of preparation.)
Yes, I used “God” to try to avoid ambiguity about (eg) how smart the superintelligence is, and ended up just introducing ambiguity about (eg) whether God plays dice. Oops. I think the God hypothetical ends up showing the usual thing: Oracles fail[1] at large/chaotic tasks, and succeed at small/narrow tasks. Sure, more things are small and narrow if you are God, but that’s not very illuminating.
So, back to an Oracle, not invoking God, writing a book of chess for a beginner, filling it with lines that are forcing for black, trying to get >50% of the tree. Why do we care, why are we discussing this? I think because chess is so much smaller and less chaotic than most domains we care about, so if an Oracle fails at chess, it’s probably going to also fail at AI alignment, theorem proving, pivotal acts, etc.
There’s some simple failure cases we should get out of the way:
As you said, if Magnus knows or suspects what he’s playing against, he plays a few lower probability moves and gets out of the predicted tree. Eg, 1. e4 d6 is a 1% response from Magnus. Or, if Magnus thinks he’s playing a beginner, then he uses the opportunity to experiment, and becomes less predictable. So assume that he plays normally, predictably.
If Magnus keeps playing when he’s in a lost position, it’s really hard for a move to be “forced” if all moves lead to a loss with correct play. One chess principle I got from a book: don’t resign before the end game if you don’t know that your opponent can play the end game well. Well, assume that Magnus resigns a lost position.
What if the beginner misremembers something, and plays the wrong move? How many moves can a beginner remember, working from an Oracle-created book that has everything pre-staged with optimized mnemonics? I assume 1,000 moves, perfect recall. 10 moves per page for a 100 page book.
So we need to optimize for lines that are forcing, short, and winning[2]. Shortness is important because a 40 move line where each move is 98% forced is overall ~45% forcing, and because we can fit more short lines into our beginner’s memory. If you search through all top-level chess games and find ones where the players play the engine-recommended move ten times in a row, that is optimizing for winning (from the players) and forcing (from the search). Ten moves isn’t long enough, we need ~30 moves for a typical game.
Terrible estimate: with 500,000 games in chessgames.com, say there are 50 games with forcing lines of ten moves, a 10,000x reduction. An Oracle can search better, for games that haven’t been played yet. So maybe if Oracle searched through 5 trillion games it would find a game with a forcing line of 20 moves? At some point I question whether chess can be both low variance enough to have these long forcing lines, and also high variance enough to have so many potential games to search through. Of course chess has ample variance if you allow white to play bad moves, but then you’re not winning.
Another approach, trying to find a forcing opening, running through the stats on chessgames.com in a greedy way, I get this “Nimzo-Indian, Samisch” variation, which seems to be playable for both sides, but perhaps slightly favors black:
d4 Nf6 (73% forced—Magnus games)
c4 e6 (72% forced—Magnus games)
Nc3 Bb4 (83% forced—all games)
a3 Bxc3+ (100% forced—all games)
bxc3 c5 (55% forced—all games)
f3 d5 (85% forced—all games)
Multiplying that through gets 20% forcing over six moves. So maybe Oracle is amazingly lucky and there are hitherto undiscovered forcing lines directly from this well-known position to lost positions for black, missed by Stockfish, AlphaZero, and all humans. Well, then Oracle still needs to cover another 30% of the tree and get just as lucky a few more times. If that happens, I think I’m in crisis of faith mode where I have to reevaluate whether grandmaster chess was an elaborate hoax. So many positions we thought were even turn out to be winning for white, everyone missed it, what happened?
Where “fail” means “no plan found”, “memory and time exhausted”, “here’s a plan that involves running a reasoner in real-time” or “feed me observations in real-time and ask me only to generate a local and by-default-inscrutable action”, as listed by so8res above.
Yeah, this is part of what I was getting at. The narrowness of the task “write a set of instructions for a one-off victory against a particular player” is a crucial part of what makes it seem not-obviously-impossible to me. Fully simulating Magnus should be adequate, but then obviously you’re invoking a reasoner. What I’m uncertain about is if you can write such instructions without invoking a reasoner.
I agree that it’s plausible chess-plans can be compressed without invoking full reasoners (and with a more general point that there are degrees of compression you can do short of full-on ‘reasoner’, and with the more specific point that I was oversimplifying in my comment). My intent with my comment was to highlight how “but my AI only generates plans” is sorta orthogonal to the alignment question, which is pushed, in the oracle framework, over to “how did that plan get compressed, and what sort of cognition is invoved in the plan, and why does running that cognition yield good outcomes”.
I have not yet found a pivotal act that seems to me to require only shallow realtime/reactive cognition, but I endorse the exercise of searching for highly specific and implausibly concrete pivotal acts with that property.
This dialog was much less painful for me to read than i expected, and I think it manages to capture at least a little of the version-of-this-concept that I possess and struggle to articulate!
(...that sentence is shorter, and more obviously praise, in my native tongue.)
A few things I’d add (epistemic status: some simplification in attempt to get a gist across):
Part of what’s going on here is that reality is large and chaotic. When you’re dealing with a large and chaotic reality, you don’t get to generate a full plan in advance, because the full plan is too big. Like, imagine a reasoner doing biological experimentation. If you try to “unroll” that reasoner into an advance plan that does not itself contain the reasoner, then you find yourself building this enormous decision-tree, like “if the experiments come up this way, then I’ll follow it up with this experiment, and if instead it comes up that way, then I’ll follow it up with that experiment”, and etc. This decision tree quickly explodes in size. And even if we didn’t have a memory problem, we’d have a time problem—the thing to do in response to surprising experimental evidence is often “conceptually digest the results” and “reorganize my ontology accordingly”. If you’re trying to unroll that reasoner into a decision-tree that you can write down in advance, you’ve got to do the work of digesting not only the real results, but the hypothetical alternative results, and figure out the corresponding alternative physics and alternative ontologies in those branches. This is infeasible, to say the least.
Reasoners are a way of compressing plans, so that you can say “do some science and digest the actual results”, instead of actually calculating in advance how you’d digest all the possible observations. (Note that the reasoner specification comprises instructions for digesting a wide variety of observations, but in practice it mostly only digests the actual observations.)
Like, you can’t make an “oracle chess AI” that tells you at the beginning of the game what moves to play, because even chess is too chaotic for that game tree to be feasibly representable. You’ve gotta keep running your chess AI on each new observation, to have any hope of getting the fragment of the game tree that you consider down to a managable size.
Like, the outputs you can get out of an oracle AI are “no plan found”, “memory and time exhausted”, “here’s a plan that involves running a reasoner in real-time” or “feed me observations in real-time and ask me only to generate a local and by-default-inscrutable action”. In the first two cases, your oracle is about as useful as a rock; in the third, it’s the realtime reasoner that you need to align; in the fourth, all whe word “oracle” is doing is mollifying you unduly, and it’s this “oracle” that you need to align.
(NB: It’s not obvious to me that cancer cures require routing through enough reality-chaos that plans fully named in advance need to route through reasoners; eg it’s plausible that you can cure cancer with a stupid amount of high-speed trial and error. I know of no pivotal act, though, that looks so easy to me that nonrealtime-plans can avoid the above quadlemma.)
My point above addresses this somewhat, but I’m going to tack another point on for good measure. Suppose you build an oracle and take the “the plan involves a realtime reasoner” fork of the above quadlemma. How does that plan look? Does the oracle say “build the reasoner using this simple and cleanly-factored mind architecture, which is clearly optimizing for thus-and-such objectives?” If that’s so easy, why aren’t we building our minds that way? How did it solve these alignment challenges that we find so difficult, and why do you believe it solved them correctly? Also, AIs that understand clean mind-architectures seem deeper in the tech tree than AIs that can do some crazy stuff; why didn’t the world end five years before reaching this hypothetical?
Like, specifying a working mind is hard. (Effable, transparent, and cleanly-factored minds are hander still, apparently.) You probably aren’t going to get your first sufficiently-good-reasoner from “project oracle” that’s training a non-interactive system to generate plans so hard that it invents its own mind architectures and describes their deployment, you’re going to get it from some much more active system that is itself a capable mind before it knows how to design a capable mind, like (implausible detail for the purpose of concrete visualization) the “lifelong learner” that’s been chewing through loads and loads of toy environments while it slowly acretes the deep structures of cognition.
Maybe once you have that, you can go to your oracle and be like “ok, you’re now allowed to propose plans that involve deploying this here lifelong learner”, but of course your lifelong learner doesn’t have to be a particularly alignable architecture; its goals don’t have to be easily identifiable and cleanly separable from the rest of its mind.
Which is mostly just providing more implausible detail that makes the “if your oracle emits plans that involve reasoners, then it’s the reasoners you need to align” point more concrete. But… well, I’m also trying to gesture at why the “what if we train the oracle to only output reasonable plans?” thought seems, to me, to come at it from a wrong angle, in a manner that I still haven’t managed to precisely articulate.
(I’m also hoping this conveys at least a little more of why the “just build an oracle that does alignment research” looks harder than doing the alignment research our own damn selves, and I’m frustrated by how people give me a pitying look when I suggest that humanity should be looking for more alignable paradigms, and then turn around and suggest that oracles can do that no-problem. But I digress.)
Possible world: Alignment is too hard for a small group of people to cleanly understand, but not too far beyond that. In part because the profitability/researcher status gradient doesn’t push AI research towards alignment, building an AI which is cleanly designed and aligned is a natural solution found by a mid-level messy AI, even though that mid-level messy AI is still too dumb to help mainstream researchers gain a ton of power by the tasks they try it on. Because gaining power is hard due to adversarial pressures.
(After I’ve written that, I believe what I’ve written less, one because it involves a few independent details, but two because I don’t see why the mainstream researchers wouldn’t have elicited that capability but alignment researchers did.
I have an intuition that I didn’t fully express with the above, though, and so I’m not totally backing off of my hunch that there’s some gap in your argument which I quoted.)
It’s not obvious to me how generally true this is. You can’t literally specify every move at the beginning of the game, but it seems like there could be instructions that work for more specified chess tasks. Like, I imagine a talented human chess coach could generate a set of instructions in English that would work well for defeating me at chess at least once (maybe there already exist “how to beat noobs at chess” instructions that will work for this). I would be unsurprised if there exists a set of human-readable instructions of human-readable length that would give me better-than-even odds of defeating a pre-specified chess expert at least once, that can be generated just by noticing and exploiting as-yet-unnoticed regularities in either that expert’s play in particular or human-expert-level chess in general.
It’s possible my intuition here is related to my complete lack of expertise in chess, and I would not be surprised if Magnus-Carlsen-defeating instructions do not exist (at least, not without routing through a reasoner). Still, I think I assign greater credence to shallow-pattern-finding AI enabling a pivotal act than you do, and I’m wondering if the chess example is probing this difference in intuition.
As a causal chess player it seems unlikely to me that there are any such instructions that would lead a beginner to beat even a halfway decent player. Chess is very dependent on calculation (mentally stepping through the game tree) and evaluation (recognising if a position is good or bad). Given the slow clock speed of the human brain (compared to computers), our calculations are slow and so we must lean heavily on a good learned evaluation function, which probably can’t be explicitly represented in a way that would be fast enough to execute manually. In other words you’d end up taking hours to make a move or something.
There’s no shortcut like “just move these pawns 3 times in a mysterious pattern, they’ll never expect it”—“computer lines” that bamboozle humans require deep search that you won’t be able to do in realtime.
Edit: the Oracle’s best chance against an ok player would probably be to give you a list of trick openings that lead to “surprise” checkmate and hope that the opponent falls into one, but it’s a low percentage.
I’m not sure that this is true. (Depends a lot on what rating do you define as “Halfway decent”). There are, in fact, rules that generalize over lots of board states, such as
capture toward the center
don’t advance the pawns around your king
early on, focus on getting knight/bishop to squares from which they have many moves
etc.
If I had one day to make such a list, I don’t think a beginner could use it to beat a 1200 player in, say, a 30 minute game. But I’m very uncertain about the upper limit of usefulness of such a list. I wonder about stuff like that a lot, but it’s very hard to tell. (Have you read a book about chess principles?)
I’m not even confident that you couldn’t beat Magnus. It depends on a bunch of factors, but perhaps you could just choose a line that seems forcing for black and try to specify enough branches of the tree to give you > 50% chance that it covers the game with Magnus. You could call this cheating, but it’s unclear how to formalize the challenge to avoid it. If Magnus knows who he’s playing against, this would make it significantly harder.
I’m very confident that Magnus absolutely crushes a beginner who has been given a personal chess book, of normal book length, written by God. Magnus still has all the advantages.
Magnus can evaluate moves faster and has a deeper search tree.
The book of chess can provide optimal opening lines, but the beginner needs to memorize them, and Magnus has a greater capacity for memorizing openings.
The book of chess can provide optimal principles for evaluating moves, but the beginner has to apply them, and decide what to do when they point in different directions. This comes from practice. A book of normal size can only provide limited practice examples.
The beginner will have a higher rate of blunders. It is hard to “capture toward the center” when you don’t even see the capture.
Some intuitions from chess books: the book God would give to a beginner is different to the book God would give a 1200 player. After reading a chess book, it is normal for ability to initially go down, until the advice has been distilled and integrated with practice. Reading a chess book helps improve faster, not to be instantly better.
Some intuitions from chess programs: they lose a lot of power if you cut down their search time to simulate the ability of a beginner to calculate variations, and also cut down their opening database to simulate the ability of a beginner to memorize openings, and also give them a random error chance to simulate a beginner’s rate of blunders.
Sorry for the double response, but a separate point here is that your method of estimating the effectiveness of the best possible book seems dubious to me. It seems to be “let’s take the best book we have; the perfect book won’t be that much better”. But why would this be true, at all? We have applied tons of optimization pressure to chess and probably know that the ceiling isn’t that far above Stockfish, but we haven’t applied tons of optimization pressure to distilling chess. How do you know that the best possible book won’t be superior by some large factor? Why can’t the principles be so simple that applying them is easy? (This is a more general question; how can you e.g. estimate the effectiveness of the best possible text book for some subfield of math?)
I’m a bit more sympathetic to this if we play Blitz, but for the most interesting argument, I think we should assume classical time format, where any beginner can see all possible captures.
Thanks for the double response. This line seems potentially important. If we could safely create an Oracle that can create a book of chess that massively boosts chess ability, then we could maybe possibly miraculously do the same thing to create a book that massively boosts AI safety research ability.
I agree that my argument above was pretty sketchy, just “intuitions” really. Here’s something a bit more solid, after further reflection.
I’m aware of adversarial examples and security vulnerabilities, so I’m not surprised if a superintelligence is able to severely degrade human performance via carefully selected input. A chess book that can make Magnus lose to a beginner wouldn’t surprise me. Neither would a chess book that degraded a beginner’s priorities such that they obsessed about chess, for however many Elo points that would be worth.
But mostly this problem is in the opposite direction: can we provide carefully curated input that allows an intelligence to learn much faster? In this direction the results seem much less dramatic. My impression is that the speed of learning is limited by both the inputs and the learner. If the book of chess is a perfect input, then the limiting factor is the reader, and an average reader won’t get outsized benefits from perfect inputs.
Possible counter-argument: supervised learning can outperform unsupervised learning by some large factor, data quality can likewise have a big impact. That’s fine, but every chess book I’ve read has been supervised learning, and chess books are already higher data quality than scraping r/chess. So those optimizations have already been made.
Possible counter-argument: few-shot learning in GPT-3? This seems more like surface knowledge that is already in the language model. So maybe a chess beginner already has the perfect chess algorithm somewhere in their brain, and the chess book just needs to surface that model and suppress all the flawed models that are competing with it? I don’t buy it, that’s not what it feels like learning chess from the inside, but maybe I need to give the idea some weight.
Possible counter-argument: maybe humans are actually really intelligent and really good learners and the reason we’re so flawed is that we have bad inputs? Eg from other flawed humans, random chance hiding things, biases in what we pay attention to, etc. I don’t buy this, but I don’t actually have a clear reason why.
Which results did you have in mind? The ‘machine teaching’ results are pretty dramatic and surprising, although one could question whether they have any practical implications.
I wasn’t aware of them. Thanks. Yes, that’s exactly the sort of thing I’d expect to see if there was a large possible upside in better teaching materials that an Oracle could produce. So I no longer disagree with Rafael & Richard on this.
My problem with this is that you’re treating the amount of material as fixed and abstracting it as “speed”; however, what makes me unsure about the power of the best possible book is that it may choose a completely different approach.
E.g., consider the “ontology” of high-level chess principles. We think in terms of “development” and “centralization [of pieces]” and “activity” and “pressure” and “attacking” and “discoveries” and so forth. Presumably, most of these are quite helpful; if you have no concept of discoveries, you will routinely place your queen or king on inconvenient squares and get punished. If you have no concept of pressure, you have no elegant way of pre-emptive reaction if your opponent starts aligning a lot of pieces toward your king, et cetera.
So, at the upper end of my probability distribution for how good a book would be, it may introduce a hundred more such concepts, each one highly useful to elegantly compress various states. It will explain them all in the maximally intuitive and illustrative way, such that they all effortlessly stick, in the same way that sometimes things you hear just make sense and fit your aesthetic, and you recall them effortlessly. After reading this book, a beginner will look at a bunch of moves of a 2000 elo player, and go “ah, these two moves clearly violate principle Y”. Even though this player has far less ability to calculate lines, they know so many elegant compressions that they may compensate in a direct match. Much in the same way that you may beat someone who has practiced twice as long as you but has no concept of pressure; they just can’t figure out how to spot situations from afar where their king is suddenly in trouble.
Isn’t it trivial for the beginner to beat Magnus using this book? God just needs to predict Magnus perfectly, and write down a single list of moves that the beginner needs to follow to beat him. Half a page is enough.
In general, you ignored this approach, which is the main reason why I’m unsure whether a book from a superintelligence could beat Magnus.
I read your idea of “a line that seems forcing for black”, and I interpreted it as being forcing for black in general, and responded in terms of memorizing optimal opening lines. It sounds like you meant a line that would cause Magnus in particular to respond in predictable ways? Sorry for missing that.
I can imagine a scenario with an uploaded beginner and an uploaded Magnus in a sealed virtual environment running on error-correcting hardware with a known initial state and a deterministic algorithm, and your argument goes through there, and in sufficiently similar scenarios.
Whereas I had in mind a much more chaotic scenario. For example, I expect Magnus’s moves to depend in part on the previous games he played, so predicting Magnus requires predicting all of those games, and thus the exponential tree of previous games. And I expect his moves to depend in part on his mood, eg how happy he’d be with a draw. So our disagreement could be mostly about the details of the hypothetical, such as how much time passes between creating the book and playing the game?
So to clarify: this interpretation was correct. I was assuming that a superintelligence cannot perfectly predict Magnus, pretty much for the reasons you mention (dependency on previous games, mood, etc.) But I then changed that standard when you said
Unlike a superintelligence, surely god could simulate Magnus perfectly no matter what; this is why I called the problem trivial—if you invoke god.
If you don’t invoke god (and thus can’t simulate magnus), I remain unsure. There are already games where world champions play the top move recommended by the engine 10 times in a row, and those have not been optimized for forcing lines. You may overestimate how much uncertainty or variance there really is. (Though again, if Magnus knows what you’re doing, it gets much harder since then he could just play a few deliberately bad moves and get you out of preparation.)
Yes, I used “God” to try to avoid ambiguity about (eg) how smart the superintelligence is, and ended up just introducing ambiguity about (eg) whether God plays dice. Oops. I think the God hypothetical ends up showing the usual thing: Oracles fail[1] at large/chaotic tasks, and succeed at small/narrow tasks. Sure, more things are small and narrow if you are God, but that’s not very illuminating.
So, back to an Oracle, not invoking God, writing a book of chess for a beginner, filling it with lines that are forcing for black, trying to get >50% of the tree. Why do we care, why are we discussing this? I think because chess is so much smaller and less chaotic than most domains we care about, so if an Oracle fails at chess, it’s probably going to also fail at AI alignment, theorem proving, pivotal acts, etc.
There’s some simple failure cases we should get out of the way:
As you said, if Magnus knows or suspects what he’s playing against, he plays a few lower probability moves and gets out of the predicted tree. Eg,
1. e4 d6
is a 1% response from Magnus. Or, if Magnus thinks he’s playing a beginner, then he uses the opportunity to experiment, and becomes less predictable. So assume that he plays normally, predictably.If Magnus keeps playing when he’s in a lost position, it’s really hard for a move to be “forced” if all moves lead to a loss with correct play. One chess principle I got from a book: don’t resign before the end game if you don’t know that your opponent can play the end game well. Well, assume that Magnus resigns a lost position.
What if the beginner misremembers something, and plays the wrong move? How many moves can a beginner remember, working from an Oracle-created book that has everything pre-staged with optimized mnemonics? I assume 1,000 moves, perfect recall. 10 moves per page for a 100 page book.
So we need to optimize for lines that are forcing, short, and winning[2]. Shortness is important because a 40 move line where each move is 98% forced is overall ~45% forcing, and because we can fit more short lines into our beginner’s memory. If you search through all top-level chess games and find ones where the players play the engine-recommended move ten times in a row, that is optimizing for winning (from the players) and forcing (from the search). Ten moves isn’t long enough, we need ~30 moves for a typical game.
Terrible estimate: with 500,000 games in chessgames.com, say there are 50 games with forcing lines of ten moves, a 10,000x reduction. An Oracle can search better, for games that haven’t been played yet. So maybe if Oracle searched through 5 trillion games it would find a game with a forcing line of 20 moves? At some point I question whether chess can be both low variance enough to have these long forcing lines, and also high variance enough to have so many potential games to search through. Of course chess has ample variance if you allow white to play bad moves, but then you’re not winning.
Another approach, trying to find a forcing opening, running through the stats on chessgames.com in a greedy way, I get this “Nimzo-Indian, Samisch” variation, which seems to be playable for both sides, but perhaps slightly favors black:
d4 Nf6 (73% forced—Magnus games)
c4 e6 (72% forced—Magnus games)
Nc3 Bb4 (83% forced—all games)
a3 Bxc3+ (100% forced—all games)
bxc3 c5 (55% forced—all games)
f3 d5 (85% forced—all games)
Multiplying that through gets 20% forcing over six moves. So maybe Oracle is amazingly lucky and there are hitherto undiscovered forcing lines directly from this well-known position to lost positions for black, missed by Stockfish, AlphaZero, and all humans. Well, then Oracle still needs to cover another 30% of the tree and get just as lucky a few more times. If that happens, I think I’m in crisis of faith mode where I have to reevaluate whether grandmaster chess was an elaborate hoax. So many positions we thought were even turn out to be winning for white, everyone missed it, what happened?
Where “fail” means “no plan found”, “memory and time exhausted”, “here’s a plan that involves running a reasoner in real-time” or “feed me observations in real-time and ask me only to generate a local and by-default-inscrutable action”, as listed by so8res above.
It doesn’t help that chess players also search for lines that are forcing, short, and winning, at least some of the time.
You can consider me convinced that the “find forcing lines” approach isn’t going to work.
(How well the perfect book could “genuinely” teach someone is a different question, but that’s definitely not enough to beat Magnus.)
Yeah, this is part of what I was getting at. The narrowness of the task “write a set of instructions for a one-off victory against a particular player” is a crucial part of what makes it seem not-obviously-impossible to me. Fully simulating Magnus should be adequate, but then obviously you’re invoking a reasoner. What I’m uncertain about is if you can write such instructions without invoking a reasoner.
I agree that it’s plausible chess-plans can be compressed without invoking full reasoners (and with a more general point that there are degrees of compression you can do short of full-on ‘reasoner’, and with the more specific point that I was oversimplifying in my comment). My intent with my comment was to highlight how “but my AI only generates plans” is sorta orthogonal to the alignment question, which is pushed, in the oracle framework, over to “how did that plan get compressed, and what sort of cognition is invoved in the plan, and why does running that cognition yield good outcomes”.
I have not yet found a pivotal act that seems to me to require only shallow realtime/reactive cognition, but I endorse the exercise of searching for highly specific and implausibly concrete pivotal acts with that property.