I don’t think we disagree on culture. I was specifically disagreeing with the claim that Metaculus doesn’t have this problem “because it is not a market and there is no cost to make a prediction”. Your point that culture can override or complement incentives is well made.
Martin Randall
The cost to make a prediction is time. The incentive of making it look like “Metaculus thinks X” is still present. The incentive to predict correctly is attenuated to the extent that it’s a long-shot conditional or a far future prediction. So Metaculus can still have the same class of problem.
If anyone’s tracking impact, my family had five people tested due in large part due to this post, of whom five were low and started supplementing. We’re not even vegan.
Would you, failing to observe anything on the subject after a couple of hours of Googling, conclude that your civilization must have some unknown good reason why not everyone was doing this already?
No, but not for “Civilizational Adequacy” reasons. In a hypothetical civilization that is Adequate in the relevant ways, but otherwise like ours, I would also not conclude that there was an unknown good reason why not everyone was doing this already. Here’s a simple model to apply for many values of X:
If civilization has an unknown-to-me reason why X is a good idea, I expect to observe search results saying that X is a good idea and giving the reason.
If civilization has an unknown-to-me reason why X is a bad idea, I expect to observe search results saying that X is a bad idea and giving the reason.
If civilization does not know if X is a good or bad idea, I expect to observe no search results, or mixed search results.
I don’t see any way for me to conclude, from a lack of search results, that civilization must have some unknown good reason why not everyone was doing this already.
I tried to disprove this by thinking of a really random and stupid X. Something so randomly stupid that I could be the first person in civilization to think of it. The idea of “inject bleach to cure covid” was already taken. What if I generalize to injecting some other thing to cure some other thing? My brain generated “cure cancer by injecting hydrogen peroxide”. No, sorry, not random and stupid enough, the internet contains Does Hydrogen Peroxide Therapy Work?, it was the first search result.
More randomness and stupidity needed! How about putting cats in the freezer to reduce their metabolism and therefore save money on cat food? Well, yes this appears to be a new innovation in the field of random stupid ideas. On the other hand the first ten search results included Cat survives 19-hour ordeal in freezer so civilization has a reason why that is a bad idea and it’s available for anyone who searches for it.
I’m obviously not creative enough so I asked Claude. After a few failed attempts (emojis in variables names! jumping jacks before meetings!) we got to:
Improve remote work meetings by requiring everyone to keep one finger touching their nose at all times while speaking, with their video on. Missing the nose means you lose speaking privileges for 30 seconds.
Success: a truly novel, truly terrible idea. In this case, civilization has multiple good reasons why not everyone is doing this already, but there aren’t any specific search results on this specific idea. Even then, if I spend a couple of hours searching and reading, I’m going to hit some tangential search results that will give me some hints.
I don’t think this proposal satisfies Linearity (sorry, didn’t see kave’s reply before posting). Consider two days, two players.
Day 1:
A ⇒ $200
B ⇒ $0
A + B ⇒ $400
Result: $400 to A, $0 to B.
Day 2:
A ⇒ $100
B ⇒ $100
A + B ⇒ $200
Result: $100 to A, $100 to B.
Combined:
A ⇒ $300
B ⇒ $100
A + B ⇒ $600
So: Synergy(A+B) ⇒ $200
Result: $450 to A, $150 to B. Whereas if you add the results for day 1 and day 2, you get $500 to A, $100 to B.
This is pretty common in any joint planning exercise. My friend and I are deciding which movie to see together. We share relevant information about what movies are available and what movies we each like and what movies we have seen. We both conclude that this movie here is the best one to see together.
This is excellent. Before reading this post in 2023, I had the confusion described. Roughly, that Aumann agreement is rationally correct, but this mostly doesn’t happen, showing that mostly people aren’t rational. After reading this post, I understood that Aumann agreement is extremely common, and the exceptions where it doesn’t work are best understood as exceptions. Coming back to read it in 2024, it seems obvious. This is a symptom of the post doing its job in 2023.
This is part of a general pattern. When I think that human behavior is irrational, I know nothing. When I see how human behavior can be modeled as rational, I have learned something. Another example is how people play The Ultimatum Game. When I was shown how turning down an “unfair” share can be modeled as a rational response to coercion, I had a better model with better predictions and a better appreciation of my fellow humans.
The post is short, clearly written, seeks to establish a single thing, establishes it, and moves on without drama. Perhaps this is why it didn’t get a lot of engagement when it was posted. The 2023 review is a chance to revisit this.
I could build on this post by describing how Aumann agreement occurs in prediction markets. On Manifold there are frequently markets where some group of people think “90% YES” and others think “90% NO” and there are big feelings. If this persists over a long enough period, with no new evidence coming in, the market settles at some small percentage range with people on both sides hiding behind walls of limit orders and scowling at each other. To some extent this is because both sides have built up whatever positions satisfy their risk tolerance. But a lot of it is the horrible feeling that the worst people in the world may be making great points.
Does this look like a motte-and-bailey to you?
Bailey: GPTs are Predictors, not Imitators (nor Simulators).
Motte: The training task for GPTs is a prediction task.
The title and the concluding sentence both plainly advocate for (1), but it’s not really touched by the overall post, and I think it’s up for debate (related: reward is not the optimization target). Instead there is an argument for (2). Perhaps the intention of the final sentence was to oppose Simulators? If that’s the case, cite it, be explicit. This could be a really easy thing for an editor to fix.
Does this look like a motte-and-bailey to you?
Bailey: The task that GPTs are being trained on is … harder than the task of being a human.
Motte: Being an actual human is not enough to solve GPT’s task.
As I read it, (1) is false, the task of being a human doesn’t cap out at human intelligence. More intelligent humans are better at minimizing prediction error, achieving goals, inclusive genetic fitness, whatever you might think defines “the task of being a human”. In the comments, Yudkowsky retreats to (2), which is true. But then how should I understand this whole paragraph from the post?
And since the task that GPTs are being trained on is different from and harder than the task of being a human, it would be surprising—even leaving aside all the ways that gradient descent differs from natural selection—if GPTs ended up thinking the way humans do, in order to solve that problem.
If we’re talking about how natural selection trained my genome, why are we talking about how well humans perform the human task? Evolution is optimizing over generations. My human task is optimizing over my lifetime. Also, if we’re just arguing for different thinking, surely it mostly matters whether the training task is different, not whether it is harder?
Overall I think “Is GPT-N bounded by human capabilities? No.” is a better post on the mottes and avoids staking out unsupported baileys. This entire topic is becoming less relevant because AIs are getting all sorts of synthetic data and RLHF and other training techniques thrown at them. The 2022 question of the capabilities of a hypothetical GPT-N that was only trained on the task of predicting human text is academic in 2024. On the other hand, it’s valuable for people to practice on this simpler question before moving on to harder ones.
Does the recent concern about mirror life change your mind? It’s not nano, but it does imply there’s a design space not explored by bio life, which implies there could be others, even if specifically diamonds don’t work.
I enjoyed this but I didn’t understand the choice of personality for Alice and Charlie, it felt distracting. I would have liked A&C to have figured out why this particular Blight didn’t go multi-system.
Playing around with the math, it looks like Shapley Values are also cartel-independent, which was a bit of a surprise to me given my prior informal understanding. Consider a lemonade stand where Alice (A) has the only lemonade recipe and Bob (B1) and Bert (B2) have the only lemon trees. Let’s suppose that the following coalitions all make $100 (all others make $0):
A+B1
A+B2
A+B1+B2 (excess lemons get you nothing)
Then the Shapley division is:
A: $50
B1: $25
B2: $25
If Bob and Bert form a cartel/union/merger and split the profits then the fair division is the same.
Previously I was expecting that if there are a large number of Bs and they don’t coordinate, then Alice would get a higher proportion of the profits, which is what we see in real life. This also seems to be the instinct of others (example).
I think I’m still missing something, not sure what.
A dissenting voice on info-hazards. I appreciate the bulleted list starting of premises and building towards conclusions. Unfortunately I don’t think all the reasoning holds up to close scrutiny. For example, the conclusion that “infohoarders are like black holes for infohazards” conflicts with the premise that “two people can keep a secret if one of them is dead”. The post would have been stronger if it had stopped before getting into community dynamics.
Still, this post moved and clarified my thinking. My sketch at a better argument for a similar conclusion is below:
Definitions:
hard-info-hazard: information that reliably causes catastrophe, no mitigation possible.
soft-info-hazard: information that risks catastrophe, but can be mitigated.
Premises:
Two people can keep a secret if one of them is dead.
If there are hard-info-hazards then we are already extinct, we just don’t know it.
You, by yourself, are not smart enough to tell if an info-hazard is hard or soft.
Authorities with the power to mitigate info-hazards are not aligned with your values.
Possible strategies on discovering an infohazard:
Tell nobody.
Tell everybody.
Follow a responsible disclosure process.
Expected Value calculations left as an exercise for the reader, but responsible disclosure seems favored. The main exception is if we are in Civilizational Hospice where we know we are going extinct in the next decade anyway and are just trying to live our last few years in peace.
Sometimes when I re-read Yudkowsky’s older writings I am still comfortable with the model and conclusion, but the evidence seems less solid than on first reading. In this post, Matthew Barnett poses problems for the evidence from Japan in Yudkowsky’s Inadequacy and Modesty. Broadly he claims that Haruhiko Kuroda’s policy was not as starkly beneficial as Yudkowsky claims, although he doesn’t claim the policy was a mistake.
LessWrong doesn’t have a great system for handling (alleged) flaws in older posts. Higher rated posts have become more visible with the “enriched” feed, which is good, but there isn’t an active mechanism for revising them in the face of critiques. In this case the author is trying to make our extinction more dignified and revisiting Japan’s economy in 2013 isn’t an optimal use of time. In general, authors shouldn’t feel that posting to LessWrong obliges them to defend their writings in detail years or decades later.
I don’t know that Barnett’s critique is important enough to warrant a correction or a footnote. But it makes me wish for an editor or librarian to make that judgment, or for someone to make a second edition, or some other way that I could recommend “Read the Sequences” without disclaimers.
I’m an epistemically modest person, I guess. My main criticism is one that is already quoted in the text, albeit with more exclamation points than I would use:
You aren’t so specially blessed as your priors would have you believe; other academics already know what you know! Civilization isn’t so inadequate after all!
It’s not just academics. I recall having a similar opinion to Yudkowsky-2013. This wasn’t a question of careful analysis of econobloggers, I just read The Economist, the most mainstream magazine to cover this type of question, and I deferred to their judgment. I started reading The Economist because my school and university had subscriptions. The reporting is paywalled but I’ll cite Revolution in the Air (2013-04-13) and Odd men in (1999-05-13) for anyone with a subscription, or just search for Haruhiko Kuroda’s name.
Japan 2013 monetary policy is a win for epistemic modesty. Instead of reading econblogs and identifying which ones make the most sense, or deciding which Nobel laureates and prestigious economists have the best assessment of the situation, you can just upload conventional economic wisdom into your brain as an impressionable teenager and come to good conclusions.
Disclaimer: Yudkowsky argues this doesn’t impact his thesis about civilizational adequacy, defined later in this sequence. I’m not arguing that thesis here, better to take that up where it is defined and more robustly defended.
I liked this discussion but I’ve reread the text a few times now, and I don’t think this fictional Outcome Pump can be sampling from the quantum wavefunction. The post gives examples that work with classical randomness, and not so much with quantum randomness. Most strikingly:
… maybe a powerful enough Outcome Pump has aliens coincidentally showing up in the neighborhood at exactly that moment.
The aliens coincidentally showing up in the neighborhood is a surprise to the user of the Outcome Pump, but not to the aliens who have been traveling for a thousand years to coincidentally arrive at this exact moment. They could be from the future, but the story allows time rewinding, not time travel. It’s not sampling from the user’s prior, because the user didn’t even consider the gas main blowing up.
I think the simplest answer consistent with the text is that the Outcome Pump is magic, and sampling from what the user’s prior “should be”, given their observations.
Yes, and. The post is about the algorithmic complexity of human values and it is about powerful optimizers (“genies”) and it is about the interaction of those two concepts. The post makes specific points about genies, including intelligent genies, that it would not make if it was not also about genies. Eg:
There are three kinds of genies: Genies to whom you can safely say “I wish for you to do what I should wish for”; genies for which no wish is safe; and genies that aren’t very powerful or intelligent.
You wrote, “the Outcome Pump is a genie of the second class”. But the Time Travel Outcome Pump is fictional. The genie of the second class that Yudkowsky-2007 expects to see in reality is an AI. So the Outcome Pump is part of a parable for this aspect of powerful & intelligent AIs, despite being unintelligent.
There’s lots of evidence I could give here, the tags (“Parables & Fables”), a comment from Yudkowsky-2007 on this post, and the way others have read it, both in the comments and in other posts like Optimality is the Tiger. Also, the Time Travel Outcome Pump is easy to use safely, it’s not the case that “no wish is safe”, and that attitude only makes sense parabolically. I don’t think that’s a valuable discussion topic, I’m not sure you would even disagree.
However, when reading parables, it’s important to understand what properties transfer and what properties do not. Jesus is recorded as saying “The Kingdom of Heaven is like a pearl of great price”. If I read that and go searching for heaven inside oysters then I have not understood the parable. Similarly, if someone reads this parable and concludes that an AI will not be intelligent then they have not understood the parable or the meaning of AI.
I don’t really see people making that misinterpretation of this post, it’s a pretty farcical take. I notice you disagree here and elsewhere. Given that, I understand your desire for a top-of-post clarification. Adding this type of clarification is usually the job of an editor.
Also, I will refer to them using the name they actually used at that time. (If I talk about the Ancient Rome, I don’t call it Italian Republic either.)
A closer comparison than Ancient Rome is that all types of people change their names on occasions, e.g. on marriage, so we have lots of precedent for referring to people whose names have changed. This includes cases where they strongly dislike their former names. Those traditions balance niceness, civilization, rationality, and free speech.
Disclaimer: not a correction, just a perspective.
Thanks for the extra information. Like you, my plans and my planning can be verbal, non-verbal, or a mix.
Why refer to it as a “verbal conscious planner”—why not just say “conscious planner”? Surely the difference isn’t haphazard?
I can’t speak for the author, but thinking of times where I’ve “lacked willpower” to follow a plan, or noticed that it’s “draining willpower” to follow a plan, it’s normally verbal plans and planning. Where “willpower” here is the ability to delay gratification rather than to withstand physical pain. My model here is that verbal plans are shareable and verbal planning is more transparent, so it’s more vulnerable to hostile telepaths and so to self-deception and misalignment. A verbal plan is more likely to be optimized to signal virtue.
Suppose I’m playing chess and I plan out a mate in five, thinking visually. My opponent plays a move that lets me capture their queen but forgoes the mate. I don’t experience “temptation” to take the queen, or have to use “willpower” to press ahead with the mate. Whereas a verbal plan like “I’m still a bit sick, I’ll go to bed early” is more likely to be derailed by temptation. This could of course be confounded by the different situations.
I think you raise a great question, and the more I think about it the less certain I am. This model predicts that people who mostly think visually have greater willpower than those who think verbally. Which I instinctively doubt, it doesn’t sound right. But then I read about the power of visualization and maybe I shouldn’t? Eg Trigger-Action Planning specifically calls out rehearsed visualization as helping to install TAPs.
Thanks for explaining. I now understand you to mean that LessWrong and Lighthaven are dramatically superior to the alternatives, in several ways. You don’t see other groups trying to max out the quality level in the same ways. Other projects may be similar in type, but they are dissimilar in results.
To clarify on my own side, when I say that there are lots of similar projects to Lighthaven, I mean that many people have tried to make conference spaces that are comfortable and well-designed, with great food and convenient on-site accommodation. Similarly, when I say that there are lots of similar projects to LessWrong, I mean that there are many forums with a similar overall design and moderation approach. I wasn’t trying to say that the end results are similar in terms of quality. These are matters of taste, anyway.
Sorry for the misunderstanding.
Agreed. That isn’t a difference between contributing “considerations” and “predictions” (using Habryka’s reported distinction). There are people who contribute good analysis about geopolitics. Others contribute good analysis about ML innovations. Does that transfer to analysis about AGI / ASI? Time will tell—mostly when it’s already too late. We will try anyway.
In terms of predicting the AI revolution the most important consideration is what will happen to power. Will it be widely or narrowly distributed? How much will be retained by humans? More importantly, can we act in the world to change any of this? These are similar to geopolitical questions, so I welcome analysis and forecasts from people with a proven track record in geopolitics.
The industrial revolution is a good parallel. Nobody in 1760 (let alone 1400) predicted the detailed impacts of the industrial revolution. Some people predicted that population and economic growth would increase. Adam Smith had some insights into power shifts (Claude adds Benjamin Franklin, François Quesnay and James Steuart). That’s about the best I expect to see for the AI revolution. It’s not nothing.