Raising the waterline

Among the goals of Less Wrong is to “raise the sanity waterline” of humanity. We’ve also talked about “raising the rationality waterline”: the phrase is somewhat popular around these parts, which suggests that the metaphor is catchy. But is that all there is to it, a catchy metaphor? Or can the phrase be more usefully cashed out?

While reading Nate Silver’s The Signal and the Noise, I came across a discussion of “raising the waterline” which fleshes out the metaphor with a more substantial model. This model preserves some of the salient aspects of the metaphor as discussed on LW, for instance the perception that the current waterline (as regards sanity and rationality) is “ridiculously low”. More interestingly, it fleshes out some of the specific ways that a “waterline” belief should constrain our future sensory experiences, maybe even to the point of quantifying what should result from low (or rising) waterlines.

This is intended as a short series:

“Raising the waterline”, this introductory post, will summarize Nate Silver’s “waterline” model, within its original context of playing Poker, which Silver frames as a game of prediction under uncertainty. Poker therefore serves as a “toy model” for a much more general class of problems.
“Raising the forecasting waterline” will extend the discussion to the kind of forecasts studied by Philip Tetlock’s Good Judgement Project, a prediction game somewhat similar to PredictionBook and related to prediction markets; I will leverage the waterline model to extract useful insights from my participation in GJP.
“Raising the discussion waterline”, a shamelessly speculative coda, will relate the previous two posts to the question of “how do Internet discussions reliably lead to correct inferences from true beliefs, or fail to do so”; I will argue that the waterline model brings some hope that a few basic tactics could nevertheless provide large wins, and raise the more general question of what other low waterlines we could aim to exploit.

The Model

The waterline model is introduced in Chaper 10, “The Poker Bubble”, to explain how for a period of time in the 2000′s Silver found it fairly easy to make a living from playing online poker, but this source of revenue later dried up altogether.

I was one of those people. I lived the poker dream for a while, and then it died. I learned that poker sits at the muddy confluence of the signal and the noise. My years in the game taught me a great deal about the role that chance plays in our lives and the delusions it can produce when we seek to understand the world and predict its course.

One graph neatly summarizes all features of the model, and strikes me as a good candidate for illlustrating the “one picture worth a thousand words” dictum:

The horizontal dimension is “effort” or “experience”. One possible unit of measurement there could be “hours”—with the caveat that they should be hours of deliberate practice. (I’m not entirely sure why this is expressed as a percentage—I’ll come back to taking the graph with a grain of salt.)

The vertical dimension is labeled “accuracy”, but we could more simply call it “gain”. One possible unit of measurement could be “money earned over some period of time playing the game”.

Chance, skill and practice

I want to briefly come back to the distinction between experience and practice, as this is a very important but often missed point.

Poker, like all games of chance, driven by random reinforcement, is a highly addictive activity. (In the days following having my interest piqued by Silver’s description and wanting to give it a try, I found myself losing more hours to the game than I care to think about—and I was only playing computer opponents for virtual chips, poker’s methadone compared to the crack-like properties of online play for real money ¹.) It is entirely possible to spend a lot of time in actual play without ever having much to show for it in terms of improvement.

Thus, a less easily measurable but more appropriate construct for the horizontal axis would be something like “number of basic insights absorbed, in the appropriate order”. One way to operationalize this would be to devise a number of tests, for instance, and apply these tests externally to the behaviour of a player: do they fold under circumstance X, raise under circumstance Y, are they able to quantify this or that aspect of the game?

This distinction is well-known in other domains, such as software engineering, where “10 years of experience is not the same as one year of experience repeated 10 times” turns out to be a useful mantra in hiring situations.

Poker’s Pareto Principle

The most interesting features of the model are the curve itself, relating effort and gain; and the “waterline”.

The curve isn’t linear, but follows the same shape as a Pareto distribution: it obeys the “80/20”principle most often associated with Pareto’s original observation in the domain of economics—twenty percent of the population holds eighty percent of the wealth. Here, the idea is that twenty percent of the effort is enough to get you at a level of performance better than eighty percent of the population—not bad!

In poker, for instance, simply learning to fold your worst hands, bet your best ones, and make some effort to consider what your opponent holds will substantially mitigate your losses. If you are willing to do this, then perhaps 80 percent of the time you will be making the same decision as one of the best poker players like Dwan—even if you have spent only 20 percent as much time studying the game.

Silver dubs this the “Pareto Principle of Prediction”—it applies well beyond poker, to a large class of activities based on skills that require deliberate practice.

Raising the waterline

The curve divides into three parts: in the first part, progress is very rapid, disproportionate to the amount of effort you put in. In the middle, you are “grinding”—progress is steadier, requiring the accumulation and honing of a number of distinct techniques. Finally, as you near the top of the performance curve, ever-smaller gains in performance require ever-greater refinement of your existing skills and acquisition of subtle nuances of technique. This is the ten-thousand hour domain, that of “mastery”.

The “waterline” represents the typical level of performance that you can expect to see in the player population. Silver represents it as a horizontal line, so we need to think of it as a “gain” level—the typical (say, median) poker player is earning (or losing) the amount of money per time period implied by the particular position of the waterline.

The key idea in the waterline is that in many cases it’s not how well you do in an absolute sense that matters—it’s how well you do relative to the competition. This is especially true in a zero-sum game! If the waterline is low, then you can make very handsome gains at the cost of a limited investment in acquiring the basic skills. But if the waterline is high, there is no alternative, before you can beat the competition, to grinding your way through the lower insights, until you finally level up enough:

When a field is highly competitive, it is only through this painstaking effort around the margin that you can make any money. There is a “water level” established by the competition and your profit will be like the tip of an iceberg: a small sliver of competitive advantage floating just above the surface, but concealing a vast bulwark of effort that went in to support it.

(This, Silver argues, is what happened to online poker after 2006 and the Unlawful Internet Gambling Enforcement Act. The professional or semi-professional players, who derived an actual income stream from the game, continued playing, but the weaker amateurs quit, in the face of a tougher environment. Bad for poker, maybe; good for rationalists, at least insofar as it led Silver to start work on his election forecasting site FiveThirtyEight.com which has debunked a fair amount of madness about political polling, and ultimately to his book, which may introduce a broader population to largely Less Wrong compatible ideas.)

The (somewhat) bad news

Another important caveat here—this is, as far as I can tell, purely a conceptual model: I’m not aware that there’s much hard empirical data that supports the curve having the shape pictured above. Silver does have statistics to show that in the case of poker, the population of mediocre players has a key role in “feeding” the better players, and based on before-and-after numbers, makes a good case that bad players drying up is bad for the better players. However, he doesn’t cite anything that would suggest the precise relationship between effort and gain has been measured and shown to fit the curve.

The good news

This model and its more fleshed-out description of what a “ridiculously low” waterline entails are great news: it gives us some testable predictions. Suppose you find yourself in a competitive situation; conditional on the Pareto principle being applicable there, if you notice that applying a short set of uncomplicated techniques reliably results in outperforming your peers, that constitutes some evidence of a low waterline.

For instance, if it is indeed the case that the waterline is very low in the skill of “thinking probabilistically”, then those of us who have heeded the lessons of Less Wrong should be able to perform very well in a competitive forecasting environment, by applying only a few basic tools that unfortunately (for them) the general population doesn’t yet possess.

Failure to observe this would also have interesting consequences, raising the strength of alternate hypotheses: for instance a) we’re not as good at applying even those basic techniques as we thought we were, or b) this particular competitive domain is not after all governed by a Pareto distribution, or c) the waterline is not as low as we thought it was.

It turns out I was able to get some experience in just the kind of setting where this type of test could be performed: the Good Judgment Project. That’s my next post.

¹ This post is one way I hope to redeem those hours, turning them into a more productive effort after the fact.

P.S.: as this has been asked elsewhere, would I recommend Nate Silver’s book? I certainly enjoyed it a lot, even though not all of it was new to me, and I often found myself wishing for clearer separation between the undoubtedly fascinating stories he tells—about financial panics, stock market crashes, baseball, poker, presidential elections, and so on—and the insights he draws from them, such as the waterline model. But enjoying it isn’t quite the same as being ready to endorse it to others; I’m not quite sure yet how much value it would have, depending on the audience. Veteran Less Wrongers might not respond to it the same way as the general public, for instance. I found it valuable enough that I might invest some time in writing a short chapter-by-chapter summary and overall review, to answer that question for myself.