I’m not sure there exactly is an “underlying quantity” here. Differences in rating, hence odds ratios in results, are fairly well defined (though, note, it’s not like there’s any sort of necessary principle along the lines of “if when A plays lots of games against B their odds are a:b, and if when B plays lots of games against C their odds are b:c, then when A plays lots of games against C their odds are a:c”, which the Elo scale and the usual ways of updating Elo ratings in the light of results are effectively assuming IIUC). But I don’t think there’s an absolute thing that the differences are sensibly regarded as differences in or ratios of.
I guess you could try to pick some “canonical” single player—a truly random player, or a literally perfect player—and look at odds ratios there. But I think the assumption I mentioned in the previous paragraph really does break down in that case.
I’m not sure I understand this point well.
I expanded on it a couple of paragraphs below. If that still didn’t clarify, can you say a bit about what doesn’t make sense?
That ~600 point difference is much smaller than the 1800+ point difference between Carlsen and the median chess player [...] the argument that I made [] that Carlsen was closer to optimal play than to median, not that optimal play was not much better than Carlsen.
Hmm, maybe I misunderstood something? You wrote
It seems that for many cognitive tasks, the median practitioner is often much closer to beginner/completely unskilled/random noise than they are to the best in the world [… footnote:] My intuitive sense is that [...] the gap between Magnus Carlsen and the median human in chess ability is 10x − 1000x the gap between the median human and a dumb human.
It’s the “10x-1000x” I’m disputing, not any version of “much closer” that’s compatible with the “right” numbers being on the order of 0 / 1000 / 3000.
As for any relationship to optimal play, I was getting that from assuming that what you said about chess was intended as support for drawing a scale with “idiot” on the left and “Einstein” on the right rather than one with “mouse” on the left and “vast superhuman intelligence” on the right. (My own feeling is that what we actually want is more likely a scale with all four of those points on it, and no pair super-close together. Idiots really are much smarter than mice; Einstein really is much smarter than an idiot; God really is much smarter than Einstein; and for many purposes none of those gaps is so large or so small as to make the others negligible. For difficult enough tasks, idiots and mice may be indistinguishable, or even idiots and mice and average people, but those are also the tasks for which I expect hypothetical superintelligences to have big advantages over the smartest humans.)
So I took you to be suggesting that the median-Carlsen gap is large enough that at least one of these pictures is not misleading: ||---------| (vertical bars are idiot, median, Carlsen) or |------------|| (vertical bars are median, Carlsen, superhuman machine). And I don’t agree with either; I think the idiot-median-Carlsen-machine picture is more like |-----|----------|-----| except that I don’t know how much room there is for that last gap to grow as the machines continue to improve.
And my reason for thinking this is that (1) if you use something like log odds (roughly equivalent to Elo ratings) to measure the gap sizes, then that’s what happens, and (2) if you use the odds themselves[1] then indeed the larger gaps become larger by huge factors, but in that case the |-----|--------| pictures where you put gaps next to one another and compare their sizes are completely misleading, because the correct way to combine two gaps is not to put them side by side (which amounts to adding up their sizes), and (3) these large-factor gaps don’t seem to me to be reason to prefer your “just idiot...Einstein” scale to the sort that Yudkowsky likes to draw, because the point Yudkowsky is trying to make by drawing them is about what happens to the right of Einstein, and there is good reason to think that there’s plenty—in particular, there’s a large-odds-ratio gap—to the right of Carlsen.
It’s the “10x-1000x” I’m disputing, not any version of “much closer” that’s compatible with the “right” numbers being on the order of 0 / 1000 / 3000.
I think that’s basically correct. Magnus Carlsen’s expected score vs a median human is 100s of times greater than a median human’s expected score vs a dumb human (as inferred from their ELO, I sketched a rough calculation at the end of this post).
As for the remainder of your reply, the point of Yudkowsky’s I was contending with is the claim that Einstein is very close to an idiot in absolute terms (especially compared to the difference between an idiot and a chimpanzee).
I wasn’t touching on how superintelligences compare to Einstein.
Magnus Carlsen’s expected score vs a median human is 100s of times greater than a median human’s expected score vs a dumb human
since to a good approximation Carlsen gets 100 wins, 0 draws, 0 losses against a median human for a total score of 100, and a median human gets at least an expected score of 50 against a dumb human.
I do not dispute that there are ways of doing the accounting that make the Carlsen-median gap 100x (or 1000x or whatever) bigger than the median-dumbest gap. My claim is that for most purposes those ways of doing the accounting are worse.
I can’t tell whether you think my reasons for thinking that are too stupid to deserve a response, or think they miss the point in some fundamental way, or don’t understand them, or just aren’t very interested in discussing them. But that’s where the actual disagreement lies.
As for mouse/chimp/idiot/Einstein, my general model of these things is that for most mental tasks there’s a minimum level of brainpower needed to do them at all, which for things we think of as interesting mental tasks generally lies somewhere between “idiot” and “Einstein” (because if even idiots could do them easily we wouldn’t think of them as interesting mental tasks, and if even Einsteins couldn’t do them we mostly wouldn’t think of them at all), and sometimes but maybe not always a maximum level of brainpower needed to do them about as well as possible, which might or might not also be somewhere between “idiot” and “Einstein”, and then the biggest delta is the one between not being able to Do The Thing and being able to do it, and after that any given increment matters a lot until you get to the maximum, and after that nothing matters much.[1] So when we pay attention to some specific task we think of as a difficult thing humans can do, we should expect to find “mouse”, “chimp”, “idiot”, and some further portion of the human population all clustered together and “Einstein” some way away. But there are also tasks that pretty much all humans can do, some of which distinguish (e.g.) mice from chimps, and I think it’s fair to say that there is a real sense in which humans are closer to chimps than chimps are to mice even though for human-ish mental tasks there’s no difference to speak of between mice and chimps; and for some tasks there is probably huge scope for doing better than the best humans. (Some of those tasks may be ones it has never occurred to us to try because they are beyond our conception.) I think the question of how much room there is above “Einstein” on the scale is highly relevant if you are asking how close “idiot” and “Einstein” are.
[1] Of course this is a simplification; minima and maxima for this sort of thing are usually “soft” rather than “hard”, and most interesting tasks actually involve a variety of skills whose minima and maxima won’t all be in the exact same place, and brainpower isn’t really one-dimensional, etc., etc., etc. I assume you appreciate all these things as well as I do :-).
Magnus Carlsen’s expected score vs a median human is 100s of times greater than a median human’s expected score vs a dumb human
since to a good approximation Carlsen gets 100 wins, 0 draws, 0 losses against a median human for a total score of 100, and a median human gets at least an expected score of 50 against a dumb human.
“It then follows that for each 400 rating points of advantage over the opponent, the expected score is magnified ten times in comparison to the opponent’s expected score.”
Median—dumb ELO difference is 1,000 points: 102.5x difference in expected score
Magnus—median ELO difference is 1,850 points: 104.625x difference in expected score
Magnus—median gap is >100x median-human gap
104.625102.5=104.625−2.5=102.125=133.25x
I do not dispute that there are ways of doing the accounting that make the Carlsen-median gap 100x (or 1000x or whatever) bigger than the median-dumbest gap. My claim is that for most purposes those ways of doing the accounting are worse.
I can’t tell whether you think my reasons for thinking that are too stupid to deserve a response, or think they miss the point in some fundamental way, or don’t understand them, or just aren’t very interested in discussing them. But that’s where the actual disagreement lies.
You’ve not addressed what you think is wrong with the above calculation, so I’m just confused. I think it’s basically a canonical quantification of chess ability?
I think the question of how much room there is above “Einstein” on the scale is highly relevant if you are asking how close “idiot” and “Einstein” are.
We could also just ask whether idiot was closer to chimpanzee than to Einstein. I’m mostly interested in how long it takes AI to cross the human cognitive frontier, not whether strongly superhuman AI is possible (I think it is).
My issue with the calculation isn’t with the calculation. It is indeed correct (with the usual assumptions, which are probably somewhat wrong but it doesn’t much matter) that if Magnus plays many games against a median chessplayer then he will probably get something like 10^4.6 times as many points as they do, and that if a median chessplayer plays a maximally-dumb one then they will probably get something like 10^2.5 times as many points as the maximally-dumb one, and that the ratio between those two ratios is on the order of 100x. I don’t object to any of that, and never have.
I feel, rather, that that isn’t a very meaningful calculation to be doing, if what you want to do is to ask “how much better is Carlsen than median, than median is better than dumbest?”.
More specifically, my objections are as follows. (They overlap somewhat.)
0. The odds-ratio figure you are using is by no means the canonical way to quantify chess ability. Consider: the very first thing you said on the topic was “As for how to measure gaps, I think the negative of the logarithm of the [odds] is good”. I agree with that: I think log-odds is better for most purposes.
1. For multiplicative things like these odds ratios, I think it is generally misleading to say “gap 1 is 10x as big as gap 2” when you mean that the odds ratio for gap 2 is 10x bigger. I think that e.g. “gap 1 is twice as big as gap 2″ should mean that gap 2 is like one instance of gap 1 and then another instance of gap 1, which for odds ratios means that the odds ratio for gap 2 is the square of the odds ratio for gap 1. By that way of thinking, the median-Magnus gap is less than twice the size of the dumbest-median gap. Your terminology requires you to say that the median-Magnus gap simultaneously (a) is hundreds of times bigger than the dumbest-median gap and (b) is not large enough to fit two copies of the dumbest-median gap into (i.e., find someone X as much better than median as median is than dumber; and then find someone Y as much better than X as median is than dumber; if you do that, Y will be better than Magnus).
2. If you are going to draw diagrams like the Yudkowsky scale, which implicitly compare “gap sizes” by placing gaps next to one another, then you had better be using a measure of difference that behaves additively rather than multiplicatively. Because that’s the only way for the relative distances of A,B,C along the scale to convey accurately the relationship between the A-B, B-C, and A-C gaps. (You could of course make a scale where position is proportional to, say, “odds ratio against median player”. That will make the dumbest-median gap very small and the median-Magnus gap very large. But it will also make an “odds ratio 10:1” gap vary hugely in size depending on where on the scale it is, which I don’t think is what you want to do.)
#0. Regarding the log of the odds ratios, I want to clarify that I never meant it as a linear scale. I was working with the intuition that linear gaps in logarithmic scales are exponential.
#1. I get what you’re saying, but I think this objection would apply to any logarithmic scale; do you endorse that conclusion/generalisation of your objection?
If the gap between two points on a logarithmic scale is d, and that represents a change of D in the underlying quantity, a gap of 2d would represent a change of D2 in the underlying quantity.
Talking about change may help elide the issues from different intuitions about what gaps should mean.
My claim above was that the underlying quantity was (a linear measure of) “chess ability”, and the ELO scale had that kind of logarithmic relationship to it.
2. I was implicitly making the transformation above where I converted a logarithmic scale into a linear/additive scale.
I agree that it doesn’t make sense to use non linear scales when talking about gaps. I also agree that ELO score is one such nonlinear scale.
My claim about the size of the gap was after converting the nonlinear ELO rating to the ~linear “expected score”. Hence I spoke about gaps in expected score.
I think the crux is this:
What do you think is the best/most sensible linear measure of chess ability?
(By linear measure, i mean that a difference of kx is k times as big as a difference of x.)
I am not sure exactly what you’re asking me whether I endorse, but I do indeed think that for “multiplicative” things that you might choose to measure on a log scale, “twice as big a gap” should generally mean 2x on the log scale or squaring on the ratio scale.
If you think it doesn’t make sense to use nonlinear scales when talking about gaps, and think Elo rating is nonlinear while exp(Elo rating) is linear, then you are not agreeing but radically disagreeing with me. I think Elo rating differences are a pretty good way of measuring gaps in chess ability, and I think exp(Elo rating) is much worse.
I think Elo rating is nearer to being a linear measure of chess ability than odds ratio, to whatever extent that statement makes sense. I think that if you spend a while doing puzzles every day and your rating goes up by 50 points (~1.33x improvement in odds ratio), and then you spend a while learning openings and your rating goes up by another 50 points, then it’s more accurate to say that doing both those things brought twice the improvement that doing just one did (i.e., 100 points versus 50 points) than to say it brought 1.33x the improvement that doing just one did (i.e., 1.78x odds versus 1.33x odds). I think that if you’re improving faster and it’s 200 points each time (~3x odds) then it doesn’t suddenly become appropriate to say that doing both things brought 3x the improvement of doing one of them. I think that if you’re enough better than me that you get 10x more points than I do when we play, and if Joe Blow is enough better than you that he gets 10x more points than you do when we play, then the gap between Joe and me is twice as big as the gap between you and me or the gap between Joe and you, because the big gap can be thought of as made up of two identical smaller gaps, and not 10x as big.
I’m not sure there exactly is an “underlying quantity” here. Differences in rating, hence odds ratios in results, are fairly well defined (though, note, it’s not like there’s any sort of necessary principle along the lines of “if when A plays lots of games against B their odds are a:b, and if when B plays lots of games against C their odds are b:c, then when A plays lots of games against C their odds are a:c”, which the Elo scale and the usual ways of updating Elo ratings in the light of results are effectively assuming IIUC). But I don’t think there’s an absolute thing that the differences are sensibly regarded as differences in or ratios of.
I guess you could try to pick some “canonical” single player—a truly random player, or a literally perfect player—and look at odds ratios there. But I think the assumption I mentioned in the previous paragraph really does break down in that case.
I expanded on it a couple of paragraphs below. If that still didn’t clarify, can you say a bit about what doesn’t make sense?
Hmm, maybe I misunderstood something? You wrote
It’s the “10x-1000x” I’m disputing, not any version of “much closer” that’s compatible with the “right” numbers being on the order of 0 / 1000 / 3000.
As for any relationship to optimal play, I was getting that from assuming that what you said about chess was intended as support for drawing a scale with “idiot” on the left and “Einstein” on the right rather than one with “mouse” on the left and “vast superhuman intelligence” on the right. (My own feeling is that what we actually want is more likely a scale with all four of those points on it, and no pair super-close together. Idiots really are much smarter than mice; Einstein really is much smarter than an idiot; God really is much smarter than Einstein; and for many purposes none of those gaps is so large or so small as to make the others negligible. For difficult enough tasks, idiots and mice may be indistinguishable, or even idiots and mice and average people, but those are also the tasks for which I expect hypothetical superintelligences to have big advantages over the smartest humans.)
So I took you to be suggesting that the median-Carlsen gap is large enough that at least one of these pictures is not misleading: ||---------| (vertical bars are idiot, median, Carlsen) or |------------|| (vertical bars are median, Carlsen, superhuman machine). And I don’t agree with either; I think the idiot-median-Carlsen-machine picture is more like |-----|----------|-----| except that I don’t know how much room there is for that last gap to grow as the machines continue to improve.
And my reason for thinking this is that (1) if you use something like log odds (roughly equivalent to Elo ratings) to measure the gap sizes, then that’s what happens, and (2) if you use the odds themselves[1] then indeed the larger gaps become larger by huge factors, but in that case the |-----|--------| pictures where you put gaps next to one another and compare their sizes are completely misleading, because the correct way to combine two gaps is not to put them side by side (which amounts to adding up their sizes), and (3) these large-factor gaps don’t seem to me to be reason to prefer your “just idiot...Einstein” scale to the sort that Yudkowsky likes to draw, because the point Yudkowsky is trying to make by drawing them is about what happens to the right of Einstein, and there is good reason to think that there’s plenty—in particular, there’s a large-odds-ratio gap—to the right of Carlsen.
[1] Insert Schiller quote here :-).
I think that’s basically correct. Magnus Carlsen’s expected score vs a median human is 100s of times greater than a median human’s expected score vs a dumb human (as inferred from their ELO, I sketched a rough calculation at the end of this post).
As for the remainder of your reply, the point of Yudkowsky’s I was contending with is the claim that Einstein is very close to an idiot in absolute terms (especially compared to the difference between an idiot and a chimpanzee).
I wasn’t touching on how superintelligences compare to Einstein.
I don’t think you mean exactly that
since to a good approximation Carlsen gets 100 wins, 0 draws, 0 losses against a median human for a total score of 100, and a median human gets at least an expected score of 50 against a dumb human.
I do not dispute that there are ways of doing the accounting that make the Carlsen-median gap 100x (or 1000x or whatever) bigger than the median-dumbest gap. My claim is that for most purposes those ways of doing the accounting are worse.
I can’t tell whether you think my reasons for thinking that are too stupid to deserve a response, or think they miss the point in some fundamental way, or don’t understand them, or just aren’t very interested in discussing them. But that’s where the actual disagreement lies.
As for mouse/chimp/idiot/Einstein, my general model of these things is that for most mental tasks there’s a minimum level of brainpower needed to do them at all, which for things we think of as interesting mental tasks generally lies somewhere between “idiot” and “Einstein” (because if even idiots could do them easily we wouldn’t think of them as interesting mental tasks, and if even Einsteins couldn’t do them we mostly wouldn’t think of them at all), and sometimes but maybe not always a maximum level of brainpower needed to do them about as well as possible, which might or might not also be somewhere between “idiot” and “Einstein”, and then the biggest delta is the one between not being able to Do The Thing and being able to do it, and after that any given increment matters a lot until you get to the maximum, and after that nothing matters much.[1] So when we pay attention to some specific task we think of as a difficult thing humans can do, we should expect to find “mouse”, “chimp”, “idiot”, and some further portion of the human population all clustered together and “Einstein” some way away. But there are also tasks that pretty much all humans can do, some of which distinguish (e.g.) mice from chimps, and I think it’s fair to say that there is a real sense in which humans are closer to chimps than chimps are to mice even though for human-ish mental tasks there’s no difference to speak of between mice and chimps; and for some tasks there is probably huge scope for doing better than the best humans. (Some of those tasks may be ones it has never occurred to us to try because they are beyond our conception.) I think the question of how much room there is above “Einstein” on the scale is highly relevant if you are asking how close “idiot” and “Einstein” are.
[1] Of course this is a simplification; minima and maxima for this sort of thing are usually “soft” rather than “hard”, and most interesting tasks actually involve a variety of skills whose minima and maxima won’t all be in the exact same place, and brainpower isn’t really one-dimensional, etc., etc., etc. I assume you appreciate all these things as well as I do :-).
What’s your issue with the below calculation.
104.625102.5=104.625−2.5=102.125=133.25x400 points ELO represents a 10x difference in expected score
“It then follows that for each 400 rating points of advantage over the opponent, the expected score is magnified ten times in comparison to the opponent’s expected score.”
Median—dumb ELO difference is 1,000 points: 102.5x difference in expected score
Magnus—median ELO difference is 1,850 points: 104.625x difference in expected score
Magnus—median gap is >100x median-human gap
You’ve not addressed what you think is wrong with the above calculation, so I’m just confused. I think it’s basically a canonical quantification of chess ability?
We could also just ask whether idiot was closer to chimpanzee than to Einstein. I’m mostly interested in how long it takes AI to cross the human cognitive frontier, not whether strongly superhuman AI is possible (I think it is).
My issue with the calculation isn’t with the calculation. It is indeed correct (with the usual assumptions, which are probably somewhat wrong but it doesn’t much matter) that if Magnus plays many games against a median chessplayer then he will probably get something like 10^4.6 times as many points as they do, and that if a median chessplayer plays a maximally-dumb one then they will probably get something like 10^2.5 times as many points as the maximally-dumb one, and that the ratio between those two ratios is on the order of 100x. I don’t object to any of that, and never have.
I feel, rather, that that isn’t a very meaningful calculation to be doing, if what you want to do is to ask “how much better is Carlsen than median, than median is better than dumbest?”.
More specifically, my objections are as follows. (They overlap somewhat.)
0. The odds-ratio figure you are using is by no means the canonical way to quantify chess ability. Consider: the very first thing you said on the topic was “As for how to measure gaps, I think the negative of the logarithm of the [odds] is good”. I agree with that: I think log-odds is better for most purposes.
1. For multiplicative things like these odds ratios, I think it is generally misleading to say “gap 1 is 10x as big as gap 2” when you mean that the odds ratio for gap 2 is 10x bigger. I think that e.g. “gap 1 is twice as big as gap 2″ should mean that gap 2 is like one instance of gap 1 and then another instance of gap 1, which for odds ratios means that the odds ratio for gap 2 is the square of the odds ratio for gap 1. By that way of thinking, the median-Magnus gap is less than twice the size of the dumbest-median gap. Your terminology requires you to say that the median-Magnus gap simultaneously (a) is hundreds of times bigger than the dumbest-median gap and (b) is not large enough to fit two copies of the dumbest-median gap into (i.e., find someone X as much better than median as median is than dumber; and then find someone Y as much better than X as median is than dumber; if you do that, Y will be better than Magnus).
2. If you are going to draw diagrams like the Yudkowsky scale, which implicitly compare “gap sizes” by placing gaps next to one another, then you had better be using a measure of difference that behaves additively rather than multiplicatively. Because that’s the only way for the relative distances of A,B,C along the scale to convey accurately the relationship between the A-B, B-C, and A-C gaps. (You could of course make a scale where position is proportional to, say, “odds ratio against median player”. That will make the dumbest-median gap very small and the median-Magnus gap very large. But it will also make an “odds ratio 10:1” gap vary hugely in size depending on where on the scale it is, which I don’t think is what you want to do.)
#0. Regarding the log of the odds ratios, I want to clarify that I never meant it as a linear scale. I was working with the intuition that linear gaps in logarithmic scales are exponential.
#1. I get what you’re saying, but I think this objection would apply to any logarithmic scale; do you endorse that conclusion/generalisation of your objection?
If the gap between two points on a logarithmic scale is d, and that represents a change of D in the underlying quantity, a gap of 2d would represent a change of D2 in the underlying quantity.
Talking about change may help elide the issues from different intuitions about what gaps should mean.
My claim above was that the underlying quantity was (a linear measure of) “chess ability”, and the ELO scale had that kind of logarithmic relationship to it.
2. I was implicitly making the transformation above where I converted a logarithmic scale into a linear/additive scale.
I agree that it doesn’t make sense to use non linear scales when talking about gaps. I also agree that ELO score is one such nonlinear scale.
My claim about the size of the gap was after converting the nonlinear ELO rating to the ~linear “expected score”. Hence I spoke about gaps in expected score.
I think the crux is this: What do you think is the best/most sensible linear measure of chess ability?
(By linear measure, i mean that a difference of kx is k times as big as a difference of x.)
I am not sure exactly what you’re asking me whether I endorse, but I do indeed think that for “multiplicative” things that you might choose to measure on a log scale, “twice as big a gap” should generally mean 2x on the log scale or squaring on the ratio scale.
If you think it doesn’t make sense to use nonlinear scales when talking about gaps, and think Elo rating is nonlinear while exp(Elo rating) is linear, then you are not agreeing but radically disagreeing with me. I think Elo rating differences are a pretty good way of measuring gaps in chess ability, and I think exp(Elo rating) is much worse.
I think Elo rating is nearer to being a linear measure of chess ability than odds ratio, to whatever extent that statement makes sense. I think that if you spend a while doing puzzles every day and your rating goes up by 50 points (~1.33x improvement in odds ratio), and then you spend a while learning openings and your rating goes up by another 50 points, then it’s more accurate to say that doing both those things brought twice the improvement that doing just one did (i.e., 100 points versus 50 points) than to say it brought 1.33x the improvement that doing just one did (i.e., 1.78x odds versus 1.33x odds). I think that if you’re improving faster and it’s 200 points each time (~3x odds) then it doesn’t suddenly become appropriate to say that doing both things brought 3x the improvement of doing one of them. I think that if you’re enough better than me that you get 10x more points than I do when we play, and if Joe Blow is enough better than you that he gets 10x more points than you do when we play, then the gap between Joe and me is twice as big as the gap between you and me or the gap between Joe and you, because the big gap can be thought of as made up of two identical smaller gaps, and not 10x as big.