Is the quote here: “we’re here to devour each other alive”?
Martin Randall
Thanks for clarifying. By “policy” and “standards” and “compelled speech” I thought you meant something more than community norms and customs. This is traditionally an important distinction to libertarians and free speech advocates. I think the distinction carves reality at the joints, and I hope you agree. I agree that community norms and customs can be unwelcoming.
As described, this type of event would not make me unrestrained in sharing my opinions.
The organizers have additional information regarding what opinions are in the bowl, so are probably in a position to determine which expressed opinions are genuinely held. This is perhaps solvable but it doesn’t sound like an attempt was made to solve this. That’s fine if I trust the organizers, but if I trust the organizers to know my opinions then I could just express my opinions to the organizers directly and I don’t need this idea.
I find it unlikely that someone can pass an Ideological Turing Test for a random opinion that they read off a piece of paper a few minutes ago, especially compared to a genuine opinion they hold. It would be rather depressing if they could, because it implies that their genuine opinions have little grounding. An attendee could deliberately downplay their level of investment and knowledge to increase plausible deniability. But such conversations sound unappealing.
There are other problems. My guess is that most of the work was done by filtering for “a certain kind of person”.
Besides, my appeal to authority trumps yours. Yes, they successfully lobbied the American legal system for the title of doctor—arguably this degrades the meaning of the word. Do you take physicians or the American legal system to be the higher authority on matters of health?
The AMA advocates for US physicians, so it has the obvious bias. Adam Smith:
People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public, or in some contrivance to raise prices.
I do not consider the AMA an impartial authority on matters such as:
Are chiropractors doctors?
Can AIs give medical advice?
How many new doctors should be trained in the US?
Can nurses safely provide more types of medical care?
How much should doctors be paid?
How much training should new doctors receive?
Should non-US doctors practice medicine in the US?
Should US medical insurance cover medical treatments outside the US?
Should we spend more on healthcare in general?
I therefore tend to hug the query and seek other evidence.
The example here is that I’m working for an NGO that opposes iodizing salt in developing countries because it is racist, for reasons. I’ve been reading online that it raises IQ and that raising IQ is good, actually. I want to discuss this in a safe space.
I can do this by having any friends or family who don’t work for the NGO. This seems more likely to work than attending a cancellation party at the NGO. If the NGO prevents me from having outside friends or talking to family then it’s dangerous and I should get out regardless of its opinion on iodization.
There are better examples, I could offer suggestions if you like, probably you can also think of many.
We can’t reliably kill agents with St Petersburg Paradox because if they keep winning we run out of resources and can no longer double their utility. This doesn’t take long, the statistical value of a human life is in the millions and doubling compounds very quickly.
It’s a stronger argument for Pascal’s Mugging.
Gilliland’s idea is that it is the proportion of trans people that dissuades some right-wing people from joining. That seems plausible to me, it matches the “Big Sort” thesis and my personal experience. I agree that his phrasing is unwelcoming.
I tried to find an official pronoun policy for LessWrong, LessOnline, EA Global, etc, and couldn’t. If you’re thinking of something specific could you say what? As well as the linked X thread I have read the X thread linked from Challenges to Yudkowsky’s pronoun reform proposal. But these are the opinions of one person, they don’t amount to politically-coded compelled speech. I’m not part of the rationalist community and this is a genuine question. Maybe such policies exist but are not advertised.
Them: The point of trade is that there are increasing marginal returns to production and diminishing marginal returns to consumption. We specialize in producing different goods, then trade to consume a diverse set of goods that maximizes utility.
Myself: Suppose there were no production possible, just some cosmic endowment of goods that are gradually consumed until everyone dies. Have we gotten rid of the point of trade?
Them: Well if people had different cosmic endowments then they would still trade to get a more balanced set to consume, due to diminishing marginal returns to consumption.
Myself: What if everyone has exactly the same cosmic endowment? And for good measure there are no diminishing returns, the tenth apple produces as much utility as the first.
Them: Well then there’s no trade, what’s the point? We just consume our cosmic endowment until we run out and die.
Myself: What if I like oranges more than apples, and you like apples more than oranges?
Them: Oh. I can trade one of my oranges for one of your apples, and we will both be better off. Darn it.
No, the effect size on bankruptcies is about 10x larger than expected. So while offline gambling may be comparable to alcohol, smartphone gambling is in a different category if we trust this research.
Of course some of those can be influenced by gambling, eg it is a type of overspending. Even so, Claude estimated that legalized online gambling would raise the bankruptcy rate by 2-3% and agreed that 28% is surprising.
The concept of marriage depends on my internals in that a different human might disagree about whether a couple is married, based on the relative weight they place on religious, legal, traditional, and common law conceptions of marriage. For example, after a Catholic annulment and a legal divorce, a Catholic priest might say that two people were never married, whereas I would say that they were. Similarly, I might say that two men are married to each other, and someone else might say that this is impossible. How quickly those arguments have faded away! I don’t think someone would use the same example ten years ago.
A potential big Model Delta in this conversation is between Yudkowsky-2022 and Yudkowsky-2024. From List of Lethalities:
The AI does not think like you do, the AI doesn’t have thoughts built up from the same concepts you use, it is utterly alien on a staggering scale. Nobody knows what the hell GPT-3 is thinking, not only because the matrices are opaque, but because the stuff within that opaque container is, very likely, incredibly alien—nothing that would translate well into comprehensible human thinking, even if we could see past the giant wall of floating-point numbers to what lay behind.
Vs the parent comment:
I think that the AI’s internal ontology is liable to have some noticeable alignments to human ontology w/r/t the purely predictive aspects of the natural world; it wouldn’t surprise me to find distinct thoughts in there about electrons. As the internal ontology goes to be more about affordances and actions, I expect to find increasing disalignment. As the internal ontology takes on any reflective aspects, parts of the representation that mix with facts about the AI’s internals, I expect to find much larger differences—not just that the AI has a different concept boundary around “easy to understand”, say, but that it maybe doesn’t have any such internal notion as “easy to understand” at all, because easiness isn’t in the environment and the AI doesn’t have any such thing as “effort”. Maybe it’s got categories around yieldingness to seven different categories of methods, and/or some general notion of “can predict at all / can’t predict at all”, but no general notion that maps onto human “easy to understand”—though “easy to understand” is plausibly general-enough that I wouldn’t be unsurprised to find a mapping after all.
Yudkowsky is “not particularly happy” with List of Lethalities, and this comment was made a day after the opening post, so neither quote should be considered a perfect expression of Yudkowsky’s belief. In particular the second quote is more epistemically modest, which might be because it is part of a conversation rather than a self-described “individual rant”. Still, the differences are stark. Is the AI utterly, incredibly alien “on a staggering scale”, or does the AI have “noticeable alignments to human ontology”? Are the differences pervasive with “nothing that would translate well”, or does it depend on whether the concepts are “purely predictive”, about “affordances and actions”, or have “reflective aspects”?
The second quote is also less lethal. Human-to-human comparisons seem instructive. A deaf human will have thoughts about electrons, but their internal ontology around affordances and actions will be less aligned. Someone like Eliezer Yudkwosky has the skill of noticing when a concept definition has a step where its boundary depends on your own internals rather than pure facts about the environment, whereas I can’t do that because I project the category boundary onto the environment. Someone with dissociative identities may not have a general notion that maps onto my “myself”. Someone who is enlightened may not have a general notion that maps onto my “I want”. And so forth.
Regardless, different ontologies is still a clear risk factor. The second quote still modestly allows the possibility of a mind so utterly alien that it doesn’t have thoughts about electrons. And there are 42 other lethalities in the list. Security mindset says that risk factors can combine in unexpected ways and kill you.
I’m not sure if this is an update from Yudkowsky-2022 to Yudkowsky-2024. I might expect an update to be flagged as such (eg “I now think that...” instead of “I think that...”). But Yudkowsky said elsewhere that he has made some positive updates. I’m curious if this is one of them.
Naming: I’ve more commonly heard “anvil problem” to refer to an exploring agent that doesn’t understand that it is part of the environment it is exploring and therefore “drops an anvil on its own head”. See anvil problem tag for more.
Let’s expand on this line of argument and look at your example of bee waggle-dances. You question whether the abstractions represented by the various dances are natural. I agree! Using a Cartesian frame that treats bees and humans as separate agents, not part of Nature, they are not Natural Abstractions. With an Embedded frame they are a Natural Abstraction for anyone seeking to understand bees, but in a trivial way. As you say, “one of the systems explicitly values and works towards understanding the abstractions the other system is using”.
Also, the meter is not a natural abstraction, which we can see by observing other cultures using yards, cubits, and stadia. If we re-ran cultural evolution, we’d expect to see different measurements of distance chosen. The Natural Abstraction isn’t the meter, it’s Distance. Related concepts like relative distance are also Natural Abstractions. If we re-ran cultural evolution, we would still think that trees are taller than grass.
I’m not a bee expect, but Wikipedia says:
In the case of Apis mellifera ligustica, the round dance is performed until the resource is about 10 meters away from the hive, transitional dances are performed when the resource is at a distance of 20 to 30 meters away from the hive, and finally, when it is located at distances greater than 40 meters from the hive, the waggle dance is performed
The dance doesn’t actually mean “greater than 40 meters”, because bees don’t use the metric system. There is some distance, the Waggle Distance, where bees switch from a transitional dance to a waggle dance. Claude says, with low confidence, that the Waggle Distance varies based on energy expenditure. In strong winds, the Waggle Distance goes down.
Humans also have ways of communicating energy expenditure or effort. I don’t know enough about bees or humans to know if there is a shared abstraction of Effort here. It may be that the Waggle Distance is bee-specific. And that’s an important limitation on the NAH, it says, as you quote, “there exist abstractions which are natural”, but I think we should also believe the Artificial Abstraction Hypothesis that says that there exist abstractions which are not natural.
This confusion is on display in the discussion around My AI Model Delta Compared To Yudkowsky, where Yudkowsky is quoted as apparently rejecting the NAH:
The AI does not think like you do, the AI doesn’t have thoughts built up from the same concepts you use, it is utterly alien on a staggering scale. Nobody knows what the hell GPT-3 is thinking, not only because the matrices are opaque, but because the stuff within that opaque container is, very likely, incredibly alien—nothing that would translate well into comprehensible human thinking, even if we could see past the giant wall of floating-point numbers to what lay behind.
But then in a comment on that post he appears to partially endorse the NAH:
I think that the AI’s internal ontology is liable to have some noticeable alignments to human ontology w/r/t the purely predictive aspects of the natural world; it wouldn’t surprise me to find distinct thoughts in there about electrons.
But also endorses the AAH:
As the internal ontology takes on any reflective aspects, parts of the representation that mix with facts about the AI’s internals, I expect to find much larger differences—not just that the AI has a different concept boundary around “easy to understand”, say, but that it maybe doesn’t have any such internal notion as “easy to understand” at all, because easiness isn’t in the environment and the AI doesn’t have any such thing as “effort”.
I appreciate the brevity of the title as it stands. It’s normal for a title to summarize the thesis of a post or paper and this is also standard practice on LessWrong. For example:
The introductory paragraphs sufficiently described the epistemic status of the author for my purposes. Overall, I found the post easier to engage with because it made its arguments without hedging.
I appreciate the clarity of the pixel game as a concrete thought experiment. Its clarity makes it easier for me to see where I disagree with your understanding of the Natural Abstraction Hypothesis.
The Natural Abstraction Hypothesis is about the abstractions available in Nature, that is to say, the environment. So we have to decide where to draw the boundary around Nature. Options:
Nature is just the pixel game itself (Cartesian)
Nature is the pixel game and the agent(s) flipping pixels (Embedded)
Nature if the pixel game and the utility function(s) but not the decision algorithms (Hybrid)
In the Cartesian frame, none of “top half”, “bottom half”, “outer rim”, and “middle square” are all Unnatural Abstractions, because they’re not in Nature, they’re in the utility functions.
In the Hybrid and Embedded frames, when System A is playing the game, then “top half” and “bottom half” are Natural Abstractions, but “outer rim” and “middle square” are not. The opposite is true when System B is playing the game.
Let’s make this a multi-player game, and have both systems playing on the same board. In that case all of “top half”, “bottom half”, “outer rim”, and “middle square” are Natural Abstractions. We expect system A to learn “outer rim” and “middle square” as it needs to predict the actions of system B, at least given sufficient learning capabilities. I think this is a clean counter-example to your claim:
Two systems require similar utility functions in order to converge on similar abstractions.
BLUF: The cited paper doesn’t support the claim that we change our minds less often than we think, and overall it and a paper it cites point the other way. A better claim is that we change our minds less often than we should.
The cited paper is freely downloadable: The weighing of evidence and the determinants of confidence. Here is the sentence immediately following the quote:
It is noteworthy that there are situations in which people exhibit overconfidence even in predicting their own behavior (Vallone, Griffin, Lin, & Ross, 1990). The key variable, therefore, is not the target of prediction (self versus other) but rather the relation between the strength and the weight of the available evidence.
The citation is to Vallone, R. P., Griffin, D. W., Lin, S., & Ross, L. (1990). Overconfident Prediction of Future Actions and Outcomes by Self and Others. Journal of Personality and Social Psychology, 58, 582-592.
Self-predictions are predictions
Occam’s Razor says that our mainline prior should be that self-predictions behave like other predictions. These are old papers and include a small number of small studies, so probably they don’t shift beliefs all that much. However much you weigh them, I think they weigh in favor of Occam’s Razor.
In Vallone 1990, 92 Students were asked to prediction their future actions later in the academic year, and those of their roommate. An example prediction, will you go to the beach? The greater time between prediction and result makes this a more challenging self-prediction. Students were 78.7% confident and 69.1% accurate for self-prediction, compared to 77.4% confident and 66.3% accurate for other-prediction. Perhaps evidence for “we change our minds more often than we think”.
I think more striking is that both self and other predictions had a similar 10% overconfidence. They also had similar patterns of overconfidence—the overconfidence was clearest when it went against the base rate, and students underweighted the base rate when making both self-predictions and other-predictions.
As well as Occam’s Razor, self-predictions are inescapably also predicting other future events. Consider the job offer case study. Will one of the employers increase the compensation during negotiation? What will they find out when they research the job locations? What advice will they receive from their friends and family? Conversely, many other-predictions are entangled with self-predictions. It’s hard to conceive how we could be underconfident in self-prediction, overconfident in other-prediction, and not notice when the two biases clash.
Short-term self-predictions are easier
In Griffin 1992, the first test of “self vs other” calibration is study 4. This is a set of cooperate/defect tasks where the 24 players predict their future actions and their partner’s future actions. They were 84% confident and 81% accurate in self-prediction but 83% confident and 68% accurate in other-prediction. So they were well-calibrated for self-prediction, and over-confident for other-prediction. Perhaps evidence for “we change our minds as often as we think”.
But self-prediction in this game is much, much easier than other-prediction. 81% accuracy is surprisingly low—I guess that players were choosing a non-deterministic strategy (eg, defect 20% of the time) or were choosing to defect based in part on seeing their partner. But I have a much better idea of whether I am going to cooperate or defect in a game like that, because I know myself a little, and I know other people less.
The next study in Griffin 1992 is a deliberate test of the impacts of difficulty on calibration, where they find:
A comparison of Figs. 6 and 7 reveals that our simple chance model reproduces the pattern of results observed by Lichtenstein & Fischhoff (1977): slight underconfidence for very easy items, consistent overconfidence for difficult items, and dramatic overconfidence for “impossible” items.
Self-predictions are not self-recall
If someone says “we change our minds less often than we think”, they could mean one or more of:
We change our minds less often than we predict that we will
We change our minds less often that we model that we do
We change our minds less often that we recall that we did
If an agent has a bad self-model, it will make bad self-predictions (unless its mistakes cancel out). If an agent has bad self-recall it will build a bad self-model (unless it builds its self-model iteratively). But if an agent makes bad self-predictions, we can’t say anything about its self-model or self-recall, because all the bugs can be in its prediction engine.
Instead, Trapped Priors
This post precedes the excellent advice to Hold Off on Proposing Solutions. But the correct basis for that advice is not that “we change our minds less often than we think”. Rather, what we need to solve is that we change our minds less often than we should.
In Trapped Priors as a basic problem of rationality, Scott Alexander explains one model for how we can become stuck with inaccurate beliefs and find it difficult to change our beliefs. In these examples, the person with the trapped prior also believes that they are unlikely to change their beliefs.
The person who has a phobia of dogs believes that they will continue to be scared of dogs.
The Republican who thinks Democrats can’t be trusted believes that they will continue to distrust Democrats.
The opponent of capital punishment believes that they will continue to oppose capital punishment.
Reflections
I took this post on faith when I first read it, and found it useful. Then I realized that, just from the quote, the claimed study doesn’t support the post, people considering two job offers are not “within half a second of hearing the question”. It was that confusion that pushed me to download the paper. I was surprised to find the Vallone citation that led me to draw the opposite conclusion. I’m not quite sure what happened in October 2007 (and “on August 1st, 2003, at around 3 o’clock in the afternoon”). Still, the sequence continues to stand with one word changed from “think” to “should”.
There are several relevant differences. It’s very difficult to spend very large amounts on Taylor Swift tickets while concealing it from your family and friends. There is no promise of potentially winning money by buying Taylor Swift tickets. Spending more money on Taylor Swift tickets gets you more or better entertainment. There is a lower rate of regret by people who spend money on Taylor Swift tickets. Taylor Swift doesn’t make most of her money from a small minority of super whales.
I scored 2200 ish with casual phone play including repeatedly pressing the wrong button by accident. I’m guessing better play should get someone up to 4,000 or so.
Given the setup I was sad there wasn’t an explicit target or outcome in terms of how much food was needed to get home safely. I think also a more phone-friendly design would have been nice.
Thanks for making the game!
Yes, the UK govt is sometimes described as “an elected dictatorship”. To the extent this article’s logic applies, it works almost exactly the opposite of the description given.
The winning party is determined by democracy (heavily distorted by fptp single winner constituencies).
Once elected, factions within the winning party have the ability to exert veto power in the House of Commons. The BATNA is to bring down the government and force new elections.
The civil service and the judiciary also serve as checks on the executive, along with being a signatory to various international treaties.
Also the UK is easy mode, with a tradition of common law rights stretching back centuries. Many differences with Iraq.