Criticism of some popular LW articles
My composition teacher in college told me that in some pottery schools, the teacher holds up your pot, examines it, comments on it, and then smashes it on the floor. They do this for your first 100 pots.
In that spirit, this post's epistemic status is SMASH THIS POT.
As an experiment, I’m choosing three popular LW posts that happen to be at the top of my feed. I’m looking for the quotes I most disagree with, and digging into that disagreement. This is because I notice I have a tendency to just consume curated LW content passively. I’d instead like to approach it with a more assertively skeptical mindset.
Thought I believe in the principle of charity, I think that a comparative advantage of LW compared with other online spaces is as a space where frank and public object-level disagreement is permitted, as long as we avoid ad hominems or trolling.
Furthermore, I’m focusing here on what I took away from these posts, rather than what these authors intended, or what they learned by writing them. Insofar as the author was specifically trying to convince me to take these statements seriously, they failed. Insofar as they had some other purpose in mind, I have absolutely no opinion on the matter.
So with personal apologies to the three excellent writers I happened to select for this exercise, here I go.
1. The Mystery of the Haunted Rationalist
Quote I disagree with:
So although it’s correct to say that the skeptics’ emotions overwhelmed their rationality, they wouldn’t have those emotions unless they thought on some level that ghosts were worth getting scared about.
No. They had those emotions because they thought on some level that dark, unfamiliar environments that their community says are scary might be unsafe. Hence, reword “This looks suspiciously like I’m making an expected utility calculation. Probability of being killed by ghost something dangerous living in the house * value of my life, compared to a million dollars.”
Impact on conclusion:
But if that’s true, we’re now up to three different levels of belief. The one I profess to my friends, the one that controls my anticipation, and the one that influences my emotions.
There are no ghosts, profess skepticism.
There are no ghosts, take the bet.
There are ghosts, run for your life!
Instead of three levels of belief, we have three levels of storytelling. The story my friends and I share, the story that controls my anticipation, and the story that’s hardwired into my amygdala.
Wouldn’t it be fun to stay the night in a “haunted house?”
There’s most likely no danger, take the bet.
There’s a small but important possibility of danger, run for your life!
Scott’s conflating the motivations under which an two people of his background are likely to stay the night in a haunted house, with the circumstances under which two skeptics might have a serious need to disprove the existence of ghosts by staying the night in a haunted house. The Korean fear of fan death is not very much like the Euro-American fear of haunted houses, which obviously are not endorsed by local scientists or the government.
Quote I disagree with:
Tallest pygmy effects are fragile, especially when they are reliant on self-fulfilling prophecies or network effects. If everyone suddenly thought the Euro was the most stable currency, the resulting switch would destabilize the dollar and hurt both its value and the US economy as a whole.
This is begging the question. If everyone suddenly thought the Euro was the most stable currency, something dramatic would have had to have happened to shift the stock market’s assessment of the fundamentals of the US vs. EU economies and governments. Economies are neither fragile nor passive, and these kinds of mass shifts in opinion on economic matters don’t blow with the wind. Furthermore, people are likely to hedge their bets. If the US and EU currencies are similar in perceived stability, serious investors are likely to diversify.
“Tallest pygmy effect” is another term for “absolute advantage,” but with the added baggage of being potentially offensive, pumping our intuitions toward seeing institutions as human-like in scale, and being disconnected with the terms “comparative advantage” and “absolute” advantage, which are standard economic jargon and have useful tie-ins with widely available pedagogical material in that field.
Impact on conclusion:
We shouldn’t use the term “tallest pygmy effect,” and should be very skeptical of LessWrong takes on economic issues unless there’s strong evidence the presenter knows what they’re talking about. This updates me in the direction of popularity being a poor proxy for accuracy or usefulness.
3. Is Rhetoric Worth Learning?
Quote I disagree with:
Quote 1:
On LessWrong, people often make a hard distinction between being correct and being persuasive; one is rational while the other is “dark arts.”
No. On LessWrong, people use the term “dark arts” to refer specifically to techniques of persuasion that deliberately interfere with people’s truth-seeking abilities. Examples might include personal attacks such as shaming, ostracization, or threats; deliberate fallacious reasoning such as fort-and-field or Gish Gallop tactics; or techniques of obfuscation and straight-up lying.
Being persuasive isn’t “dark arts;” it’s just that good rhetoric is equally useful to anyone and is thus a symmetric weapon, one whose use doesn’t inherently help humanity progress toward truth.
Quote 2:
From a societal perspective, making any kind of improvement, at any scale above literally one-man jobs, depends on both correctness and persuasiveness. If you want to achieve an outcome, you fail if you propose the wrong method and if you can’t persuade anyone of the right method.
This is true, but rhetoric is only one small part of persuasion. Since this post is about rhetoric, but its importance is justified on the basis of the necessity of persuasion, I think this is a point that needs to be made.
Quote 3:
Some of the things I think go into talking well:
Emotional Skills
How to be aware of other people’s points of view without merging with them
How to dare to use a loud, clear voice or definite language
How to restrain yourself from anger or upset
[Etc.]
An analogous list might be:
Some of the things I think go into doing math well:
Adding skills
Knowing how to add small numbers in your head.
Knowing how to transform repeated addition into multiplication.
Knowing that the more positive numbers you add in the equation, you get a larger and larger output.
[Etc.]
While it’s true that someone who lacks the skills listed probably has some major shortcomings in the rhetoric or math departments, I think it’s unlikely that either of these lists is a useful decomposition of these skills. Both pump our intuitions in the direction of “if I practice these specific skills, I’ll get better at math.” Or “if I practice the skills on this list I intuitively think I’m bad at, I’ll get better at rhetoric overall.”
In fact, it’s not at all clear to me that the expected value of this list as a pedagogical tool to teach rhetoric is net positive or even gets across the basic idea that they author intended. It’s the kind of thing I want to keep my System 1 away from so that it doesn’t get sucked in and mislead my System 2.
Impact on conclusion:
Rhetoric might be worth learning, but there’s also a reason we have professional editors. Division of labor is important, and it’s not clear that really good rhetoric is that much better at persuasion than an oft-chanted slogan. In fact, it’s perfectly possible that good rhetoric and a correct argument are merely correlated, both caused by underlying general intelligence and sheer depth of study. It’s also possible that rhetoric is not a symmetric weapon, and that it’s easier to dress a correct idea in persuasive rhetoric than to so present an incorrect idea.
Hey—why do we all seem to assume that rhetoric is uncorrelated with truth, anyway?
Aristotle didn’t seem to think so:
Nevertheless, the underlying facts do not lend themselves equally well to the contrary views. No; things that are true and things that are better are, by their nature, practically always easier to prove and easier to believe in.
Assessment:
Respectively, I see these posts as featuring a misguided comparison, displaying a lack of scholarship and putting forth an unfounded assertion as a stylized fact, and meandering around on a topic rather than delivering on the promise implied by the title.
It feels intuitively true that our minds have ~separate systems for justifying our beliefs and changing our beliefs. Reading a post while looking for things to disagree and with the intention of stating those points of diagreement clearly feels different from my normal consumption patterns.
I notice myself feeling both nervous about a negative reaction toward my takes on these posts, and about the possibility that others might return the favor when they read my posts.
Overall, this experience leaves me with two equally concerning and compatible conjectures.
a. My reaction to rationalist content is governed by my frame of mind. If I read them seeking wisdom, then wisdom I shall find. If I read them to criticize, then I’ll find things to be critical of. Without some more formal structure in place, the nature of which I’m unaware, I am not able to “assess” content for correctness or usefulness. I can only produce positive or negative feedback. This reminds me of Eliezer’s piece Against Devil’s Advocacy, except that I’m less convinced than he seems to be that there’s such a thing as “true thinking” distinct from the production of rationalizations. Maybe if we knew what the truth was, we could measure how efficiently different modes of thinking would get us there. But that’s the problem, right? We don’t know the truth, don’t know the goal, and so it’s very hard to put a finger on this “true thinking” thing.
b. There is a lot of error-ridden content on LessWrong. The more true (a) is, then the more true I should expect (b) to be. And if error-ridden content can influence my frame of mind, then the more true (b) is, the more likely (a) is to be true as well. Reading LessWrong is like trying to learn a subject by reading student essays. It’s not a good strategy.
This experiment shifts me toward seeing LessWrong as closer to a serious-minded fan fic community than a body of scholarship. You can learn to write really well by producing fan fic. But you probably can’t learn to write well by reading primarily fan fic. Yet somebody needs to read the fan fic to make the writing of it a meaningful exercise. So reading and commenting on LessWrong is something we do as an altruistic act, in order to support the learning of our community. Posting on it, reading external sources, and inquiring into the conditions of our own lives and telling our own stories is how we transform ourselves into better thinkers.
I believe that reading and writing for LessWrong has made me a much better thinker and enhanced my leadership skills. It has my blessing. Scott Alexander and Elizabeth, my first two victims, are thinkers who I respect and whose writings and conversation I’ve found useful. Thanks also to sarahconstantin, whose body of writings I first read today as far as I know and whose thoughts on rhetoric I found interesting and insightful.
I think when assessing lesswrong it is important to think of posts and their comments as a single entity. Many of the objections to the posts that you mention are also brought up in the comment sections, often themselves highly upvoted (in the first example the comment has 1 more karma than the post).
If you take upvotes to mean you are glad something was posted then I don’t think it is inconsistent to upvote something you think contains an error. Therefore high karma alone shouldn’t be enough to consider something to have been considered correct by the LW community, just that it has some value. If there are also no/only minor critical comments then I think that is much stronger evidence.
I don’t think this completely exonerates LW but I think it means the picture is not quite as bleak as it would first appear.
(Edited to add: As an example I upvoted this post despite this comment as I think this is an important kind of thing to look at and agree that my frame of mind can be important in how I read content)
This fits with my depiction of LW as a sort of serious-minded scholarly fanfic community.
I think it can be valuable to read the post alone, write down your reaction to it, and only then examine the comments. If somebody else made the same critique as you, it’s a bit like pre-registering an experiment. You can have more confidence that you’re doing your own thinking, and that you’re converging on the truth. Perhaps this is a way to get out of the “I can only produce rationalizations” dilemma.
I think I didn’t get the fanfic analogy at first. Could I summarise it as “Lesswrong is to scholarship as serious fanfic is to original novels”?
I know the LW team have spoken about a level above curated which would be intended to be more on the level of scholarship. I think the 2018 review was designed to serve this purpose so we should hope that these posts in particular don’t contain any glaring errors!
I think it’s super valuable to be able to be able to put imperfect ideas out there (I see one of the 2018 review top posts was Babble!) but thinking about this has really emphasised to me how useful epistemic statuses are.
To your second paragraph—yes, definitely this! When I do this it definitely gets me out of the habit of passive reading.
As another example, when I get into a debate with someone in the comments section, I tend to upvote the other person’s comments as long as they’re reasonably well-thought-out and well-written.
I don’t think the criticism of post 2 here is on point at all. Elizabeth is making the claim that if everyone shifted from thinking dollars most reliable to thinking euros, that this would be self-fulfilling and have big impacts. This seems right, regardless of why this happened. The response seems wrong in four ways.
One, I don’t think that there needs to be an overarching reason. It’s not crazy that a propaganda campaign (someone ‘talking their book’ on a large enough level) combined with large bets could cause a cascade effect in worlds where that wouldn’t have otherwise happened.
Two, it’s the currency and not the stock market that matters here. Stock market is a different thing. Not central but worth noting.
Three, the currency markets are exactly the thing Elizabeth is talking about—anticipated future prices, which are a function of supply and demand. A lot of the demand for dollars is the expectation that people will demand dollars because business with others is done in dollars, etc. Hitting the tipping point would cause a cascade. The idea that markets are about some ‘fundamentals’ and causation is one-directional simply isn’t right. Expectations are huge.
Four, even if in this particular example we are not close to a switch, that would only mean it’s a bad example. The principle certainly holds. E.g. it is easy to imagine worlds in which there was a ‘flippening’ and ETH took over the BTC role as primary method of cryptocurrency payment some time in 2018, without anything fundamental changing. There’s no reason to think that would have become undone—likely the opposite, and if ETH had passed BTC it would have pulled further away over time.
To the contrary, I think the criticism of post 2 is very on point. But Zvi and I are looking at two different parts: Zvi’s looking at the logic/begging the question part, and I’m looking at the critique. In thought experiments, we can take imagined exogenous changes to be exogenous even though in the real world they’d be endogenous (i.e., we can take them as events rather than outcomes). Later, we can relax that assumption; the endogeneity problem is important for understanding whether the conclusions extend to the real world, but it is not important for understanding what the conclusions are within the thought experiment. So I agree with Zvi that the logic isn’t really an issue here.
However, I do believe this is a bad example (/weak post, Sorry Elizabeth) precisely for the reason AllAmericanBreakfast pointed out- it frames basic economics knowledge as a new insight. Admittedly, the EconLog post that was linked to doesn’t discuss comparative advantage either, but that’s because it’s really just about the “flight to safety” in 2008 where capital has to go somewhere, so it goes to the safest haven- even if that place is on fire, at least it’s not on fire next to a ticking time bomb. But, if you really want to talk about the “benefit not from absolute skill or value at a thing, but by being better at it than anyone else” then you can just consult microeconomics 101 (literally) and read up on absolute vs. comparative advantage. And then a better example of it is what you would find in the textbook (ha, probably Mankiw’s) of English cloth vs. Portuguese wine, which clearly illustrates the concepts.
Or, maybe Elizabeth really wasn’t referring to comparative advantage and more specifically to “when a superlative is applied in a context and the context is later lost.” This might seemingly apply better to the USD (we think of it as a safe haven because we used to think of it as a safe haven), but again the USD is not an apt example here because the context isn’t lost, it just changed (e.g., suppose the USD scores a 10⁄10 at being a currency and things change and now it’s a terrible 3⁄10 but it’s still better than all the rest). The Tallest Pygmy derives its tension from that fact that you think you’ve found someone “tall” but it’s just among the pygmies you’re sampling. The Tallest Pygmy, then, is best understood as getting stuck in a valley at a local, but not global, minimum (gradient descent). Or peaking at a local, but not global, maximum. Sometimes you are fine with local maxima, but if you are optimizing for global maxima, then obviously this creates a problem. May as well go with a classic example instead, which clearly illustrates sampling bias (statistics).
You see this in the academic literature as well where people refer to concepts as “effects.” I think it is a good idea to be skeptical of those findings- not that they are fake, just that more clarity could be gained from understanding the core concept that generates the effect. Elizabeth’s example is not great for comparative advantage, nor for gradient descent/sampling bias. The USD in 2008 is a “lesser of two evils effect,” or really not an effect at all- if you have a choice between 10%, 9%, and 8% returns at equal risk, you choose 10%; if a regime change occurs that makes you choose between 5%, 4.5%, and 4%, you choose 5%. It’s worse than before, but it’s the best around.
LessWrong is a great community to be in, but AllAmericanBreakfast is correct that many posts stumble upon “new” insights that are really just symptomatic of not having done enough research, particularly when it comes to economics. And that’s okay in this forum, we’re all trying to figure this stuff out!
The TPE as defined is
This has two conditions:
1. The benefit must accrue only to the best option.
2. The difference between the best and second best option must be small.
You and Elizabeth are adding two further conditions, which produce mutability:
3. It is possible for the second-best option to overtake the best option (a ‘flippening’).
And fragility:
4. It is easy for a flippening to occur.
The general mutability of a TPE seems uncontroversial, as does the existence of fragile TPEs. But mutability does not imply fragility, and Elizabeth specifically says that it does.
My mistake was in engaging with the concrete example of the US vs. EU currency, in an attempt to demonstrate that TPEs aren’t necessarily fragile. Decomposing the term and argument into atomic propositions to show where the transition from definition->empirical assertion occurs would have been a safer way to do it.
I think my underlying motivation for attacking the example is that I’m frequently exposed to people with excessive belief in economic fragility. For example, the idea that “if we all just stopped believing in the value of money, it would be just paper.” I saw this point of view behind terms like “fragile” and “self-fulfilling prophecy.”
I’m sure that for others, there’s an excessive perception of economic stability. Things change all the time, and sometimes it comes down to a pure change in public perception.
In the future, I’d like to employ the practice of decomposition before I start writing a critical response.
Elizabeth uses the word “fragile”, but doesn’t say that this is what it means. I’m not sure exactly what she means by it—and I’m not sure if the thing she means is true—but I don’t think this is a likely guess.
(My guess would be something like “unstable”, in the sense that once it goes away, there’s no particular reason to expect it to come back.)
Not specifically. She merely says that TPEs are fragile. This is somewhat a nitpick, but… I feel like you’re trying to take something informal and formalize it, and some of the features of your formalization don’t seem motivated by the informal version.
This seems basically true to me? I wouldn’t call the economy fragile, because I don’t expect this to happen. Sometimes people say things like this and I get the sense that they do think it means the economy is fragile in some way that I don’t think it is. I think they’re making a mistake, but not about this.
If an informal claim about economics doesn’t translate readily to a formal claim, then it’s not even wrong. Informal language games are fine in many contexts, but Elizabeth’s posts tend to be about careful fact-gathering and scrutinizing sources for accuracy, so I think it’s OK to apply that standard to her work.
This would only be true if
a) The TPE outweighs more fundamental factors, meaning that it’s hard for a lower-ranked choice to become the top choice, merely due to the TPE.
For example, Facebook is useful primarily because so many people are on it. It would be hard for a direct competitor to attract a user base no matter how much better the underlying software is. There is a clear way to rank all the alternatives in terms of quality, but there’s a huge cliff between 1st and 2nd place that exists merely due to the TPE.
b) Random noise outweighs more fundamental factors (or there’s no fundamental factors at all), meaning that the differences between lower-ranked choices is obscured by chance. There is no clear way to sort lower-ranked alternatives in terms of quality.
For example, you have no knowledge of horse racing. But you happen to hear that the Mafia given Black Beauty a drug that makes her 5% faster (TPE), making her slightly more likely to win the Kentucky Derby, but not guaranteeing her victory. If the Mafia changes its mind and decides to drug Seabiscuit instead, there’s no clear reason to expect that they’ll change their mind a third time. Even if they do, there’s no reason to think they’ll change their mind back and drug Black Beauty, rather than doping Secretariat or Man ’O War. The Mafia is unlikely to change its mind, so you assume that Black Beauty has a small but durable edge.
These two examples illustrate that the TPE has several factors.
It can have a large or small importance relative to fundamental factors (large in the case of Facebook, small in the case of Black Beauty). A small difference implies fragility, large differences imply durability.
It can be endogenous (Facebook’s TPE is due to it having the most users, so the fact that it’s in 1st place helps anchor it in 1st place) or exogenous (Black Beauty’s TPE is due to outside intervention, and it’s not clear how capricious the Mafia will be in changing this choice). Exogenous factors are fragile, endogenous factors are durable.
And the non-TPE fundamentals of the options can be well-ordered (as in the case of Facebook’s software quality relative to its competitors) or unordered (as in the case of the horse racing neophyte at the Kentucky Derby). Unordered fundamentals mean that the choice selected for the top spot is arbitrary, so that having held the top spot confers no special advantage in regaining it if it is lost. Unordered fundamentals imply fragility, well-ordered fundamentals imply durability.
All of these factors vary empirically. No matter whether “fragility” referred to one, two or all three of these factors, it’s clear that fragility and durability exist on a spectrum and may vary widely on a case-by-case basis.
Indeed, Elizabeth says
It’s not clear to me that “self-fulfilling prophecies or network effects” are inherently fragile. The advantage that Facebook gains due to the size of its user base is an example of a network effect that is durable in all three senses of the term as I’ve defined it.
This matters. If we cultivate “TPEs tend to be/are inherently fragile” as a heuristic, it will encourage people to look at problems like “how to make a social media company that’s bigger than Facebook” as a technical problem. “If you build it, they will come.” Well, no, not necessarily. Facebook’s TPE is a big moat. In fact, I’d offer an alternative heuristic:
The more obvious the TPE, the more likely it is to be durable.
Thanks for doing this!
Just as a clarification: None of the three articles you reviewed are curated articles, in the sense that they are not displayed as part of the curated section of the site, and that they have not been sent out to everyone who is subscribed to curated articles.
They are all still reasonably popular posts, so critiquing them still seems good. I am generally happy to see critiques like this, and think you broadly make some decent points (though I have to read what you said in more detail before I can make a better call).
Presumably you found those articles via the “From the Archives” section of the site (which is part of the “Recommended” section, which shows you posts randomly sampled from LessWrong history, proportional to a simple function of their karma (it’s karma raised to some power)). We recently changed the UI around the recommendation section which makes it less obvious where the “From the archives” section ends and the list of curated posts begins, which is something I’ve been meaning to fix, so maybe that confused you? Or maybe you were just using “curated” in some broader sense, though in that case it still seemed good to clarify for other readers who might be confused.
General inquiry as to level of appetite for this type of criticism, and whether doing such for recent posts would be a positive or negative for those writing.
(Not as a ‘should this have been written?’ but more as a ‘should I/others consider writing more similar posts?’
On the current margin I’d be interested in more of this.
Oh I was just confused. Thanks for the clarification. SMASH THIS POT.
Sorry for the confusion then! Our current UI sure seems like it would make that confusion likely, so I think of this as mostly my responsibility. I will think about how to preserve the simplicity of the section while also making it clearer that there are two types of posts in there (from the archives and curated).
A couple of initial thoughts I had whilst reading this. Take these as more of pondering on my state of mind rather than critiques or corrections.
I find this curiously foreign to my default mode of thinking when reading on LW and elsewhere. It is not uncommon for me to find myself thinking “that seems wrong” and “that seems right” within a single paragraph of content from writers I think are the “rightest”. On the other hand, I usually do not feel as confident about my assessment in either direction as you seem to be in your post.
That being said...
I assume this to be the case with all content and I’ve always assumed it holds for everyone and it hasn’t occurred to me to think of rationalist content as different in this way, but seeing you state it “out loud” makes me think maybe I should have.
Which question? That of whether the stability of currencies in in part caused by self-fulfilling prophecies? You seem to be saying that self-fulfilling prophecies dont happen dont happen with competent predictors. Do you assert this as a possibility not disproven, or as a fact?
May I ask why you think you “passively consume” LW content? I notice the same behavior in myself, so I’m curious.
P.S. I hope it’s still better than passively consuming most other media.
In one sentence, active reading produces a higher number of reactions per sentence read.
In reading the posts for this exercise, I noticed myself having a far higher number of reactions to the content than normal.
Objection to ‘a’: we observe whether our ex ante heuristics converge on the same ex post predictions that known very powerful predictions do to check whether we are using good meta heuristic selection.