Raising the waterline
Among the goals of Less Wrong is to “raise the sanity waterline” of humanity. We’ve also talked about “raising the rationality waterline”: the phrase is somewhat popular around these parts, which suggests that the metaphor is catchy. But is that all there is to it, a catchy metaphor? Or can the phrase be more usefully cashed out?
While reading Nate Silver’s The Signal and the Noise, I came across a discussion of “raising the waterline” which fleshes out the metaphor with a more substantial model. This model preserves some of the salient aspects of the metaphor as discussed on LW, for instance the perception that the current waterline (as regards sanity and rationality) is “ridiculously low”. More interestingly, it fleshes out some of the specific ways that a “waterline” belief should constrain our future sensory experiences, maybe even to the point of quantifying what should result from low (or rising) waterlines.
This is intended as a short series:
“Raising the waterline”, this introductory post, will summarize Nate Silver’s “waterline” model, within its original context of playing Poker, which Silver frames as a game of prediction under uncertainty. Poker therefore serves as a “toy model” for a much more general class of problems.
“Raising the forecasting waterline” will extend the discussion to the kind of forecasts studied by Philip Tetlock’s Good Judgement Project, a prediction game somewhat similar to PredictionBook and related to prediction markets; I will leverage the waterline model to extract useful insights from my participation in GJP.
“Raising the discussion waterline”, a shamelessly speculative coda, will relate the previous two posts to the question of “how do Internet discussions reliably lead to correct inferences from true beliefs, or fail to do so”; I will argue that the waterline model brings some hope that a few basic tactics could nevertheless provide large wins, and raise the more general question of what other low waterlines we could aim to exploit.
The Model
The waterline model is introduced in Chaper 10, “The Poker Bubble”, to explain how for a period of time in the 2000′s Silver found it fairly easy to make a living from playing online poker, but this source of revenue later dried up altogether.
I was one of those people. I lived the poker dream for a while, and then it died. I learned that poker sits at the muddy confluence of the signal and the noise. My years in the game taught me a great deal about the role that chance plays in our lives and the delusions it can produce when we seek to understand the world and predict its course.
One graph neatly summarizes all features of the model, and strikes me as a good candidate for illlustrating the “one picture worth a thousand words” dictum:
The horizontal dimension is “effort” or “experience”. One possible unit of measurement there could be “hours”—with the caveat that they should be hours of deliberate practice. (I’m not entirely sure why this is expressed as a percentage—I’ll come back to taking the graph with a grain of salt.)
The vertical dimension is labeled “accuracy”, but we could more simply call it “gain”. One possible unit of measurement could be “money earned over some period of time playing the game”.
Chance, skill and practice
I want to briefly come back to the distinction between experience and practice, as this is a very important but often missed point.
Poker, like all games of chance, driven by random reinforcement, is a highly addictive activity. (In the days following having my interest piqued by Silver’s description and wanting to give it a try, I found myself losing more hours to the game than I care to think about—and I was only playing computer opponents for virtual chips, poker’s methadone compared to the crack-like properties of online play for real money 1.) It is entirely possible to spend a lot of time in actual play without ever having much to show for it in terms of improvement.
Thus, a less easily measurable but more appropriate construct for the horizontal axis would be something like “number of basic insights absorbed, in the appropriate order”. One way to operationalize this would be to devise a number of tests, for instance, and apply these tests externally to the behaviour of a player: do they fold under circumstance X, raise under circumstance Y, are they able to quantify this or that aspect of the game?
This distinction is well-known in other domains, such as software engineering, where “10 years of experience is not the same as one year of experience repeated 10 times” turns out to be a useful mantra in hiring situations.
Poker’s Pareto Principle
The most interesting features of the model are the curve itself, relating effort and gain; and the “waterline”.
The curve isn’t linear, but follows the same shape as a Pareto distribution: it obeys the “80/20”principle most often associated with Pareto’s original observation in the domain of economics—twenty percent of the population holds eighty percent of the wealth. Here, the idea is that twenty percent of the effort is enough to get you at a level of performance better than eighty percent of the population—not bad!
In poker, for instance, simply learning to fold your worst hands, bet your best ones, and make some effort to consider what your opponent holds will substantially mitigate your losses. If you are willing to do this, then perhaps 80 percent of the time you will be making the same decision as one of the best poker players like Dwan—even if you have spent only 20 percent as much time studying the game.
Silver dubs this the “Pareto Principle of Prediction”—it applies well beyond poker, to a large class of activities based on skills that require deliberate practice.
Raising the waterline
The curve divides into three parts: in the first part, progress is very rapid, disproportionate to the amount of effort you put in. In the middle, you are “grinding”—progress is steadier, requiring the accumulation and honing of a number of distinct techniques. Finally, as you near the top of the performance curve, ever-smaller gains in performance require ever-greater refinement of your existing skills and acquisition of subtle nuances of technique. This is the ten-thousand hour domain, that of “mastery”.
The “waterline” represents the typical level of performance that you can expect to see in the player population. Silver represents it as a horizontal line, so we need to think of it as a “gain” level—the typical (say, median) poker player is earning (or losing) the amount of money per time period implied by the particular position of the waterline.
The key idea in the waterline is that in many cases it’s not how well you do in an absolute sense that matters—it’s how well you do relative to the competition. This is especially true in a zero-sum game! If the waterline is low, then you can make very handsome gains at the cost of a limited investment in acquiring the basic skills. But if the waterline is high, there is no alternative, before you can beat the competition, to grinding your way through the lower insights, until you finally level up enough:
When a field is highly competitive, it is only through this painstaking effort around the margin that you can make any money. There is a “water level” established by the competition and your profit will be like the tip of an iceberg: a small sliver of competitive advantage floating just above the surface, but concealing a vast bulwark of effort that went in to support it.
(This, Silver argues, is what happened to online poker after 2006 and the Unlawful Internet Gambling Enforcement Act. The professional or semi-professional players, who derived an actual income stream from the game, continued playing, but the weaker amateurs quit, in the face of a tougher environment. Bad for poker, maybe; good for rationalists, at least insofar as it led Silver to start work on his election forecasting site FiveThirtyEight.com which has debunked a fair amount of madness about political polling, and ultimately to his book, which may introduce a broader population to largely Less Wrong compatible ideas.)
The (somewhat) bad news
Another important caveat here—this is, as far as I can tell, purely a conceptual model: I’m not aware that there’s much hard empirical data that supports the curve having the shape pictured above. Silver does have statistics to show that in the case of poker, the population of mediocre players has a key role in “feeding” the better players, and based on before-and-after numbers, makes a good case that bad players drying up is bad for the better players. However, he doesn’t cite anything that would suggest the precise relationship between effort and gain has been measured and shown to fit the curve.
The good news
This model and its more fleshed-out description of what a “ridiculously low” waterline entails are great news: it gives us some testable predictions. Suppose you find yourself in a competitive situation; conditional on the Pareto principle being applicable there, if you notice that applying a short set of uncomplicated techniques reliably results in outperforming your peers, that constitutes some evidence of a low waterline.
For instance, if it is indeed the case that the waterline is very low in the skill of “thinking probabilistically”, then those of us who have heeded the lessons of Less Wrong should be able to perform very well in a competitive forecasting environment, by applying only a few basic tools that unfortunately (for them) the general population doesn’t yet possess.
Failure to observe this would also have interesting consequences, raising the strength of alternate hypotheses: for instance a) we’re not as good at applying even those basic techniques as we thought we were, or b) this particular competitive domain is not after all governed by a Pareto distribution, or c) the waterline is not as low as we thought it was.
It turns out I was able to get some experience in just the kind of setting where this type of test could be performed: the Good Judgment Project. That’s my next post.
1 This post is one way I hope to redeem those hours, turning them into a more productive effort after the fact.
P.S.: as this has been asked elsewhere, would I recommend Nate Silver’s book? I certainly enjoyed it a lot, even though not all of it was new to me, and I often found myself wishing for clearer separation between the undoubtedly fascinating stories he tells—about financial panics, stock market crashes, baseball, poker, presidential elections, and so on—and the insights he draws from them, such as the waterline model. But enjoying it isn’t quite the same as being ready to endorse it to others; I’m not quite sure yet how much value it would have, depending on the audience. Veteran Less Wrongers might not respond to it the same way as the general public, for instance. I found it valuable enough that I might invest some time in writing a short chapter-by-chapter summary and overall review, to answer that question for myself.
- Raising the forecasting waterline (part 1) by 9 Oct 2012 15:49 UTC; 51 points) (
- 1 Mar 2013 15:19 UTC; 18 points) 's comment on Open Thread, March 1-15, 2013 by (
- 6 Jun 2015 18:14 UTC; 2 points) 's comment on Summary of my Participation in the Good Judgment Project by (
- 11 Oct 2012 23:27 UTC; 1 point) 's comment on Skill: The Map is Not the Territory by (
I think the biggest problem I initially have with accepting Silver’s graph is the lack of evidence he gives for that arch. Putting that shape on a graph has quite a few ramifications.
Do you feel that the evidence he gave supported that shape?
That’s my biggest problem too. :)
I can’t speak entirely to the evidence he offers, it does feel like skill acquisition in many domains works as he suggests with very rapid gains initially, then a period where progress feels like you work for it, then you reach a plateau and breakthroughs become few and far between and require insane amounts of effort.
This is what happened when I was playing Go, for instance. The “waterline” in amateur go is roughly around 1dan, the rank most players find it hardest to reach. However in Go the evidence doesn’t support that particular shape very well. There are ways this could be explained, such as by frustrated “permanent kyu” players quitting the game. (As I did eventually—at a KGS 2nd kyu rank I’m not sure I even deserve, as it’s mostly from playing blitz games.)
I think it’s mostly the shape of that curve. Why does it hit 80% gain at only 20% effort? Is that the same across many different tasks?
I’m a writer (novelist), and it’s a common statement in writing circles (the ones I’m in, at least) that every writer has a million words of crap to get out. That’s a rough estimate, of course, and I’ve always taken it to show that you have to work hard at your craft to improve. At an average of 1k words/hour, that’s a good thousand hours of nothing but writing to get out.
Is that 20% effort? 50% 80% How does one chart or measure that?
Because otherwise it wouldn’t fit into the 80⁄20 principle. :/
As a non-veteran Less Wronger, I found this book both enjoyable and valuable.
The value came from the same place as the enjoyment—despite knowing about our many flaws in thinking (I’ve read the sequences and a few rationality books), it’s different when you see real-world examples. Specifically, this book partially motivated me to start re-learning the actual math (e.g. http://ocw.mit.edu/courses/economics/14-30-introduction-to-statistical-methods-in-economics-spring-2009/lecture-notes/).
The Udacity Statistics 101 course (that I started before summer but, hmm, am now on extended hiatus from) covers much the same ground, apparently. If you’re into video courses with optional Python programming exercises.
Does Silver ever employ the phrases ‘diminishing returns’ and ‘deliberate practice’ in discussing these models?
Nope, that’s me.
ETA: OK, one of these is me and the other is Silver. Glad that’s sorted out. :)
50% untrue.
“In cases like these, it can require a lot of extra effort to beat the competition. You will find that you soon encounter diminishing returns.”
I’d like to see each of the rationality/lw associated efforts to raise the sanity waterline written up in a more formalised in a more structured fashion.
Perhaps something like the following for each EA, GJP, FHI, etc etc because it’s not clear to me what the strategic reasons for a lot of the independent entities are.
Well, I’m a sailor and raising the waterline is a bad thing. You’re underwater when the waterline gets too high.
The analogy here would clearly be raising the waterline of the body of water upon which you are floating. And since you are, in fact, floating the waterline has no direct effect on you.
The lesson here is that much of the gain from having raising the sanity waterline is that for those impressive minds that excel, or stay slightly ahead of the pack, the raised sanity waterline pushes them ever further in achievement and sanity.
It’s possible that we could benefit from a better metaphor. What we really mean is raising the baseline rationality level, but perhaps we can find a more vivid way to say it.