gwern
What sample of Satoshi writings would you use that o3 wouldn’t already know was written by Satoshi Nakamoto?
Could you explain to me where the single step / multiple steps aspect comes in? I don’t see an assumption of only a single step anywhere, but maybe this comes from a lack of understanding.
Maybe you could explain why you think it covers multiple-steps? Like take my falling example. Falling is the outcome of many successive decisions taken by a humanoid body over a few hundred milliseconds. Each decision builds on the previous one, and is constrained by it: you start in a good position and you make a poor decision about some of your joints (like pivoting a little too quickly), then you are in a less-good position and you don’t make a good enough decision to get yourself out of trouble, then you are in an even less good position, and a bunch of decisions later, you are laying on the ground suddenly dying when literally a second ago you were perfectly healthy and might have lived decades more. This is why most regret bounds include a T term in them, which covers the sequential decision-making aspect of RL and how errors can compound: a small deviation from the optimal policy at the start can snowball into arbitrarily large regrets over sufficient T.
Standard price-gouging and auction problem. Same reason Taylor Swift doesn’t raise her prices or auction tickets off, but blames her shabbos goy instead.
There’s also a question of what does ‘running out of toilet paper’ mean? Back in Covid Round One, there wasn’t a real toilet paper shortage. I didn’t see a single empty shelf of toilet paper. (I did, however, see a giant pyramid of toilet paper rolls at Costco stacked on a pallet/forklift thingy, serving as a reminder that “every shortage is followed by a glut”, but the glut never gets reported.) Maybe it happened in my town in between my visits or in another store, maybe it didn’t, but if I didn’t read social media, I would have no idea there was supposed to be any such thing. Millions of people were not being forced for months to relieve themselves in an open field like they were in India… When people say ‘America ran out of toilet paper’, what they really mean is, ‘I saw a lot of photos on social media of empty shelves and people talking about how late capitalism has failed’ (and at this point, even if they claim to have had toilet paper problems themselves, we know from retrospective memory studies like of 9/11 that a lot of them didn’t). There’s no obvious way for everyone who sells toilet paper to ensure that there is never an empty shelf anywhere ever which would yield a photo to go viral. (Especially not these days when people will just fake the photo with GPT-4o.) And that wouldn’t be economically viable to begin with: stockouts are normal, healthy, and good, because the level of stock which it would take to ensure 0% stockouts would be exorbitantly expensive. To the extent that an empty shelf becomes a political weapon to poison peoples’ minds, it’s a negative externality to whoever owns and operates that shelf. So that is no reason for them to raise the price of toilet paper enough to ensure it never happened.
(Another point to make would be that people can easily endure any brief fluctuations in toilet paper supply. It’s easy to economize on it and there are usually a lot of rolls in any household and lots of mediocre substitutes of the sort that were used before ‘toilet paper’ as a specialized category became the norm—like newspapers. There’s also paper towel rolls. If it were some big deal, then after the supposed horrible disaster of shortages during COVID, everyone would’ve immediately gone out and bought bidets as the most effective permanent method to insulate yourself from future shortages. But they didn’t.)
My point is that it seems like the Gaussian assumption is obviously wrong given any actual example of a real task, like standing up without falling and breaking a hip or hitting our heads & dying (both of which are quite common in the elderly—eg my grandfather and my grandmother, respectively). And that the analysis is obviously wrong given any actual example of a real environment more complicated than a bandit. (And I think this is part of what Wentworth is getting at when he says it’s about when you “move out of a regime”. The fact that the error inside the ‘regime’ is, if you squint, maybe not so bad in some way, doesn’t help much when the regime is ultra-narrow and you or I could, ahem, fall out of the regime within a second of actions.) So my reaction is that if that is the expected regret in this scenario, which seems to be just about the best possible scenario, with the tamest errors, and the least RL aspects like having multiple steps, that you are showing that Goodhart’s Curse really is that bad, and I’m confused why you seem to think it’s great news.
When you calculate an expected regret here, are you calculating something like a myopic single-step expected regret? which ignores that utility in states is definitely not gaussian (when you are standing up, an error in actuating your joints that leads you to fall over will then yield a reward many standard deviations away from a Gaussian you fit to the rewards you experienced in the thousands of seconds before where you were uneventfully standing up in place), and that it will compound over t.
I think the AI angle here is unhelpful. You should write this post with other less controversial technologies: “where did all the steam engines go? Where did all the vaccines go? Where did all the limited-liability corporations go? Where did all the telephones go? Where did all the automobiles, trains, and tankers go?” And then once you have discussed those, you may no longer feel any need to ask, “where did all the ChatGPT tokens go?”
Well, in the chess example we do not have any obvious map/territory relation.
Yes, there is. The transcripts are of 10 million games that real humans played to cover the distribution of real games, and then were annotated by Stockfish, to provide superhuman-quality metadata on good vs bad moves. That is the territory. The map is the set of transcripts.
But if you can understand text from form alone, as LLMs seem to prove, the message simply has to be long enough.
I would say ‘diverse enough’, not ‘long enough’. (An encyclopedia will teach a LLM many things; a dictionary the same length, probably not.) Similar to meta-learning vs learning.
the pieces do not seem to refer to anything in the external world.
What external world does our ‘external world’ itself refer to things inside of? If the ‘external world’ doesn’t need its own external world for grounding, then why does lots of text about the external world not suffice? (And if it does, what grounds that external external world, or where does the regress end?) As I like to put it, for an LLM, ‘reality’ is just the largest fictional setting—the one that encompasses all the other fictional settings it reads about from time to time.
As someone who doubtless does quite a lot of reading about things or writing to people you have never seen nor met in real life and have no ‘sensory’ way of knowing that they exist, this is a position you should find sympathetic.
I don’t think there’s any necessary contradiction. Verification or prediction of what? More data. What data? Data. You seem to think there’s some sort of special reality-fluid which JPEGs or MP3s have but .txt files do not, but they don’t; they all share the Buddha-nature.
Consider Bender’s octopus example, where she says that it can’t learn to do anything from watching messages go back and forth. This is obviously false, because we do this all the time; for example, you can teach a LLM to play good chess simply by watching a lot of moves fly by back and forth as people play postal chess. Imitation learning & offline RL are important use-cases of RL and no one would claim it doesn’t work or is impossible in principle.
Can you make predictions and statements which can be verified by watching postal chess games? Of course. Just predict what the next move will be. “I think he will castle, instead of moving the knight.” [later] “Oh no, I was wrong! I anticipated seeing a castling move, and I did not, I saw something else. My beliefs about castling did not pay rent and were not verified by subsequent observations of this game. I will update my priors and do better next time.”
ChatGPT was “so good they can’t ignore you”; the Hugging Face anecdote is particularly telling. At some point, everyone else gets tired of waiting for your cargo to land, and will fire you if you don’t get with the program. “You say semantics can never be learned from syntax and you’ve proven that ChatGPT can never be useful? It seems plenty useful to me and everyone else. Figure it out or we’ll find someone who can.”
I think it’s still sycophantic compared to hardcore STEM circles where we regard criticism as a bloodsport and failing to find fault in something as defeat. But it’s much less so than the more relevant comparison, which is other LLMs, and in an absolute sense it’s at a level where it’s hard to distinguish from reasonable opinions and doesn’t seem to be getting in the way too much. As davidad notes, it’s still at a level where you can sense its reluctance or if it’s shading things to be nice, and that is a level where it’s just a small quirk and something you can work around easily.
To give a concrete example: I finish writing a review of a movie I watched the other day, and I hand it to Gemini-2.5-pro:
Gemini-2.5-pro review of first draft of Sake review
Okay, let’s break down this review of The Birth of Saké.
Overall Impression:
This is a well-written, engaging, and characteristically Gwern-esque review. It has a strong, distinct voice, a clear central argument, and uses vivid descriptions and rhetorical questions effectively. It goes beyond a simple summary to offer a provocative interpretation focused on the perceived pointlessness and human cost of the tradition depicted. The comparison to Jiro Dreams of Sushi serves as a useful framing device.
Strengths:
Clear Voice and Perspective: The review is unmistakably written by Gwern, with the analytical, slightly cynical, and utilitarian perspective evident throughout. This consistency is a strength.
Strong Central Argument: The core thesis – questioning the value and purpose of the intense, sacrificial labor involved in traditional sake brewing, especially given its declining popularity and the reviewer’s own dislike of the product – is forcefully argued and supported with evidence from the documentary (harsh conditions, worker’s death, market decline).
Engaging Writing: Phrases like “miserably picturesque,” “sadistically horrible,” and the questioning (“Why? Why do all this?”) keep the reader engaged. The description of the working conditions is particularly effective in conveying the film’s atmosphere and the reviewer’s reaction.
Effective Use of Comparison: Contrasting the film with Jiro Dreams of Sushi highlights key differences in the subject matter’s popularity and the perceived passion (or lack thereof) involved, strengthening the review’s central point about sake’s apparent niche status and the questionable nature of the sacrifice.
Good Structure: The review flows logically from introduction and context (how the reviewer came to watch it) through observations about cinematography and working conditions, to the core argument about purpose and value, and finally to a concluding thought.
Acknowledging Counterarguments (Implicitly): While strongly critical, the review acknowledges the potential admiration for “the work,” “craftsmanship,” and “art,” even as it ultimately dismisses these in favor of a more pragmatic assessment.
Honesty: The reviewer is upfront about their lack of interest in sake, which contextualizes their perspective. The Tubi/uBlock anecdote adds a relatable, slightly amusing touch.
Self-Aware Ending: The parenthetical ending is a clever, self-deprecating touch that fits the overall tone.
Areas for Critique / Potential Weaknesses:
Subjectivity as a Potential Limitation: While honesty about disliking sake is good, this strong pre-existing bias heavily colors the interpretation. The review fundamentally judges the activity based on the reviewer’s own lack of appreciation for the end product and a perceived lack of external validation (mass market appeal). It risks dismissing the internal motivations, cultural significance, or sense of community/identity that the participants might derive from the process, even if not explicitly articulated in a way the reviewer finds convincing.
Dismissal of Tradition/Cultural Context: The review questions the “why” primarily through a modern, somewhat Western, utilitarian lens. While the Meiji era isn’t ancient history like the Ise Shrine, 150 years is still a significant period for a family business in any culture. The review might underplay the weight of legacy, duty, local identity, and the potential intrinsic value placed on maintaining any tradition, regardless of its “rational” justification or global popularity. The focus is almost entirely on the negative aspects of the tradition (the labor conditions) without deeply exploring why it might persist beyond inertia or lack of options.
Interpretation of Passion/Motivation: The assertion that “no one really seems to care about it” or lacks “genuine enthusiasm” could be a projection or misreading. Passion can manifest differently across cultures and personalities. Dedication, acceptance of duty, quiet pride in craft, or commitment to community might be present but not expressed as overt “otaku”-level enthusiasm. The heir’s marketing efforts, even if seen as necessary for survival, could also stem from a deep sense of responsibility and, yes, a form of passion for preserving the family legacy.
The “Pointlessness” Argument: While powerfully made, the argument that the work is pointless because the product isn’t widely loved or “objectively” great (according to the reviewer) is a very specific philosophical stance. Many crafts, arts, and traditions exist without mass appeal or clear economic justification. Is the value solely determined by market size or the reviewer’s palate? The review doesn’t leave much room for alternative value systems.
Slight Overemphasis on Negativity?: While the film apparently provides ample evidence for the harshness and decline, the review focuses so heavily on this that it might overshadow other potential aspects or nuances the documentary presented (e.g., moments of camaraderie, pride in specific steps, the beauty of the process itself beyond just “miserably picturesque”).
Conclusion:
This is a compelling and thought-provoking review that successfully articulates a strong, critical perspective on The Birth of Saké. It excels in its clear voice, structured argument, and engaging writing. Its main potential weakness lies in the heavy filter of the reviewer’s personal taste and utilitarian viewpoint, which may lead to a dismissal of cultural nuances, alternative motivations, or the intrinsic value systems that might underpin the traditions depicted. It’s a review that tells you as much about the reviewer’s worldview as it does about the film, which, for this particular author, is often part of the appeal. It definitely makes the reader think, even if they might disagree with the core conclusion about the “pointlessness” of the endeavor.
Particularly characteristic here is
The review is unmistakably written by Gwern, with the analytical, slightly cynical, and utilitarian perspective evident throughout. This consistency is a strength.
I’ve started to wince a bit when Gemini-2.5-pro critiques a draft and throws in a descriptive note along the lines of ‘classic Gwern style: just slightly cynical’. It feels too much like it’s trying to suck up to me. (‘Wow, Gwern, you’re so edgy, but in a good way!‘) And yet… it’s not wrong, is it? I do try to avoid being overly cynical or dark, but you certainly couldn’t describe my writing as ‘slightly idealistic’ either; so, ‘slightly cynical’.
And otherwise, the description and critique seem largely correct and is not really mincing any words in the criticisms—even if I didn’t wind up making many changes based on it. So it’s more useful than Claude or 4o would be. (o3 is also good in this regard but the confabulation issue is a big problem, eg. So I just recommend 2.5-pro right now.)
I think it will be even less difference between 2020 and 2040.
In light of the subsequent 5 years of developments like Ukraine, with 15 to go, do you still stand by this claim?
This is an alarming point, as I find myself thinking about the DA today as well; I thought I was ‘gwern’, but it is possible I am ‘robo’ instead, if robo represents such a large fraction of LW-DA observer-moments. It would be bad to be mistaken about my identity like that. I should probably generate some random future dates and add them to my Google Calendar to check whether I am thinking about the DA that day and so have evidence I am actually robo instead.
I’d estimate approximately 12-15 direct meta-responses to your post within the next month alone, and see no reason to expect the exponential to turn sigmoid in timescales that render my below argument unlikely.
However, you can’t use this argument because unlike the MLWDA, where I am arguably a random observer of LW DA instances (the thought was provoked by Michael Nielsen linking to Cosma Shalizi’s notes on Mesopotamia and me thinking that the temporal distances are much less impressive if you think of them in terms of ‘nth human to live’, which immediately reminded me of DA and made me wonder if anyone had done a ‘meta-DA’, and LW simply happened to be the most convenient corpus I knew of to accurately quantify ‘# of mentions’ as tools like Google Scholar or Google N-Grams have a lot of issues—I have otherwise never taken much of an interest in the DA and AFAIK there have been no major developments recently), you are in a temporally privileged position with the MMLWDA, inasmuch as you are the first responder to my MLWDA right now, directly building on it in a non-randomly-chosen-in-time fashion.
Thus, you have to appeal purely to non-DA grounds like making a parametric assumption or bringing in informative priors from ‘similar rat and rat adjacent memes’, and that’s not a proper MMLWDA. That’s just a regular old prediction.
Turchin actually notes this issue in his paper, in the context of, of course, the DA and why the inventor Brandon Carter could not make a Meta-DA (but he and I could):
The problem is that if I think that I am randomly chosen from all DA-Doomers, we get very strong version of DA, as ‘DA-Doomers’ appeared only recently and thus the end should be very soon, in just a few decades from now. The first member of the DA-Doomers reference class was Carter, in 1973, joined by just a few of his friends in the 1980s. (It was rumored that Carter recognized the importance of DA-doomers class and understood that he was first member of it – and thus felt that this “puts” world in danger, as if he was the first in the class, the class is likely to be very short. Anyway, his position was not actually random as he was the first discoverer of the DA).
The Meta-LessWrong Doomsday Argument (MLWDA) predicts long AI timelines and that we can relax:
LessWrong was founded in 2009 (16 years ago), and there have been 44 mentions of the ‘Doomsday argument’ prior to this one, and it is now 2025, at 2.75 mentions per year.
By the Doomsday argument, we medianly-expect mentions to stop in: after 44 additional mentions over 16 additional years or in 2041. (And our 95% CI on that 44 would then be +1 mention to +1,1760 mentions, corresponding to late-2027 AD to 2665 AD.)
By a curious coincidence, double-checking to see if really no one had made a meta-DA before, it turns out that Alexey Turchin has made a meta-DA as well about 7 years ago, calculating that
If we assume 1993 as the beginning of a large DA-Doomers reference class, and it is 2018 now (at the moment of writing this text), the age of the DA-Doomers class is 25 years. Then, with 50% probability, the reference class of DA-Doomers will disappear in 2043, according to Gott’s equation! Interestingly, the dates around 2030–2050 appear in many different predictions of the singularity or the end of the world (Korotayev 2018; Turchin & Denkenberger 2018b; Kurzweil 2006).
His estimate of 2043 is surprisingly close to 2041.
We offer no explanation as to why this numerical consilience of meta-DA calculations has happened; we attribute their success, as all else, to divine benevolence.
Regrettably, the 2041--2043 date range would seem to imply that it is unlikely we will obtain enough samples of the MLWDA in order to compute a Meta-Meta-LessWrong Doomsday Argument (MMLWDA) with non-vacuous confidence intervals, inasmuch as every mention of the MLWDA would be expected to contain a mention of the DA as well.
The latter, especially, I thought was popularized because it was so surprisingly good at improving benchmark performance.
Inner-monologue is an example because as far as we know, it should have existed in pre-GPT-3 models and been constantly improving, but we wouldn’t have noticed because no one would have been prompting for it and if they had, they probably wouldn’t have noticed it. (The paper I linked might have demonstrated that by finding nontrivial performance in smaller models.) Only once it became fairly reliable in GPT-3 could hobbyists on 4chan stumble across it and be struck by the fact that, contrary to what all the experts said, GPT-3 could solve harder arithmetic or reasoning problems if you very carefully set it up just right as an elaborate multi-step process instead of what everyone did, which was just prompt it for the answer right away.
Saying it doesn’t count because once it was discovered it was such a large real improvement, is circular and defines away any example. (Did it not improve benchmarks once discovered? Then who cares about such an ‘uncoupled’ capability; it’s not a real improvement. Did it subsequently improve benchmarks once discovered? Then it’s not really an example because it’s ‘coupled’...) Surely the most interesting examples are ones which do exactly that!
And of course, now there is so much discussion, and so many examples, and it is in such widespread use, and has contaminated all LLMs being trained since, that they start to do it by default given the slightest pretext. The popularization eliminated the hiddenness. And here we are with ‘reasoning models’ which have blown through quite a few older forecasts and moved timelines earlier by years, to the extent that people are severely disappointed when a model like GPT-4.5 ‘only’ does as well as the scaling laws predicted and they start predicting the AI bubble is about to pop and scaling has been refuted.
would also be useful for accurately finishing a sentence starting with “Eliezer Yudkowsky says...”.
But that would be indistinguishable from many other sources of improvement. For starters, by giving a name, you are only testing one direction: ‘name → output’; truesight is about ‘name ← output’. The ‘reversal curse’ is an example of how such inference arrows are not necessarily bidirectional and do not necessarily scale much. (But if you didn’t know that, you would surely conclude the opposite.) There are many ways to improve performance of predicting output: better world-knowledge, abstract reasoning, use of context, access to tools or grounding like web search… No benchmark really distinguishes between these such that you could point to a single specific number and say, “that’s the truesight metric, and you can see it gets better with scale”.
Musk has now admitted his link penalty is not ‘merely’ a simple fixed penalty on the presence of a link or anything like that, but about as perverse as is possible:
To be clear, there is no explicit rule limiting the reach of links in posts.
The algorithm tries (not always successfully) to maximize user-seconds on 𝕏, so a link that causes people to cut short their time here will naturally get less exposure.
Best to post a text/image/video summary of what’s at the link for people to view and then decide if they want to click the link.
So, the higher-quality a link is, and the more people & time spent reading it, the more ‘the algorithm’ punishes it. The worse a link is, the shorter, more trivial, more clickbaity, the least worth reading, the less the algorithm punishes it and rewards it with virality. This explains a lot about Twitter these days.
(This also implies that it may be a bit hard to estimate ‘the’ link penalty, if the algorithm is doing anything to estimate the quality of a link so as to punish good ones more.)
Kodo here is definitely a reference to “Kōdō” (random Knuth). I believe Duncan has written in the past about taking up perfume/tasting comparison as a hobby, hasn’t he?
That is my theory too; see “Why So Few Matt Levines?”
Also I suspect that there is some astronomically high k such that monkeys at a keyboard (i.e. “output random tokens”) will outperform base models for some tasks by the pass@k metric.
It would be an extreme bias-variance tradeoff, yes.
Maybe it would be helpful to start using some toy models of DAGs/tech trees to get an idea of how wide/deep ratios affect the relevant speedups. It sounds like so far that much of this is just people having warring intuitions about ‘no, the tree is deep and narrow and so slowing down/speeding up workers doesn’t have that much effect because Amdahl’s law so I handwave it at ~1x speed’ vs ‘no, I think it’s wide and lots of work-arounds to any slow node if you can pay for the compute to bypass them and I will handwave it at 5x speed’.