gwern
The weather during Inkhaven was oppressively nice by comparison to most places. It didn’t even start getting meaningfully cold until the end of November.
About you, and not OP?
I was unaware of antisemitic connotations or etymology for ‘AI slop’, but I guess, after searching, that OP is referring to ‘goyslop’ and inferring that that is the etymological root here: https://en.wiktionary.org/wiki/goyslop
I’m not sure I buy that; I thought ‘AI slop’ was coined by anti-corporate leftists on Mastodon or Blue Sky, if anything, and the 6 centuries of pejorative ‘slop’ usage—particularly for food in general, not just ‘goyslop’ - before any online frogs got to the word, seem to me to be adequate to justify believing it an independent invention in lieu of direct evidence that ‘AI slop’ was based on ‘goyslop’.
Don’t be absurd. As eminent physicists have argued, influenza viruses fall from space; clusters aren’t due to mutual infection, but at simply having all been at the epicenter of a major fall together.
Since people put them on random web pages (like this one)
In what sense is this page about Gemini benchmarking and LLM canary strings a ‘random web page’ to a Gemini LLM?
Another category of what I suspect are increasing returns come from ceiling effects: if there is a ceiling, and you hit the ceiling, a rational observer will infer that your true value is noticeably higher than the ceiling. The lower the ceiling, the more the relative inflation. So you get an effect where returns are diminishing from 97% → 98% → 99%, but then at 99% → 100% there’s a sudden spike.
I’ve wondered if this is responsible for an apparent effect where adding a little bit of polish to a nearly-perfect design has an outsized impact compared to the naive diminishing-returns expectation of that polish being invisible.
‘Not as easy as it used to be’ != ‘infeasible for a stage magician’. (Keep in mind they are well-documented to do things like research audience members in advance just to pull off better cold reads. They only need one thing to succeed. How many ways are there to hack a pinball machine?)
And you’re making a lot of assumptions here about the setup, like having a known device (maybe it’s a video) which has already contacted Wikipedia before and in-cache has pinned the WP HTTPS cert, and also the user having gone to the right URL/domain in the first place (hey you know what’s hard to see on current mobile browsers because all of the tech giants despise URLs and want to eliminate them and keep you in their walled garden?). I just checked on my phone right now, and if you browse to En, the default Android Chrome browser both does not show you the
httpsand as soon as you scroll down even slightly, the entire URL disappears. The only way I found to easily see the protocol was to edit the URL! It remains quite easy to phish or spoof or cross the ‘line of death’, and people fall for these things all the time. Or, what if it’s been vandalized (can take a long time to fix, and could’ve been vandalized by a confederate mere seconds before the audience member checks)? What if it was vandalized and you’re looking at a valid WP mirror which is out of date? What if you’re looking at a specific data-poisoned revision?(Note by the way that almost none of these exploits would count for bug bounties from anyone.)
It looks like a lot of people are easily exceeding the 500-word minimum. Maybe that was insufficiently ambitious (given that no one has dropped out yet?), and Inkhaven 2 should bump to 750 or 1000 words. (Note that LLMs are already good at giving feedback and asking questions on essays and suggesting parts to expand on, so adding another 500 words is often quite easy—as long as you are willing to do the work!)
We’re going to produce about 41 * 30 = 1,230 blogposts over the month of November.
A related question would be, what would be the right number of great posts? The kind that might become shorthand or establish a new idea, or be quoted years from now.
I would say that I expect something like 1 in 100 posts to be great. (Imagine a blogger who writes 1 post a week for 2 years. Wouldn’t you expect 1 or 2 really awesome ‘all time’ posts from those years of writing? So then that’s a 1% rate.) Then that would imply ~12 great posts from Inkhaven. That’s assuming no quality/quantity tradeoff; there’s evidence from the ‘equal odds rule’ that there is no tradeoff under normal publication environments historically, but this is an unusual and recent environment, so we would expect some tradeoff, so I would expect <12 great posts.
A related question is whether we can try to estimate whether Inkhaven helped. Perhaps we could go back over the edge cases in admission, which prompted some debate and were not clear accept/rejects, and pre-register their names now, before Inkhaven is over, and then have someone blinded look over their writing trajectories or something?
It’s a fanfiction notation: https://en.wikipedia.org/wiki/Exclamation_mark#Internet_culture https://www.angelfire.com/falcon/moonbeam/terms.html#!
(My pet theory is that it’s descended from the original email notation on ARPAnet.)
It is also a simple fact that in any exponentially growing technology, it will be a ‘pop culture’: no one remembers X because they were literally not around then. If we look at how fast investment and market caps and paper count have grown, ‘LLMs’ must have a doubling time under a year. In which case, anything 3 years ago is before the vast majority of people were even interested in LLMs! (Even in AI/tech circles I talk with plenty of people who got into it and started paying attention only post-ChatGPT...) You can’t memory-hole something you never knew.
A lot of people don’t talk about Sydney for the same reason they don’t talk about Tay, say.
I mean sure, all that could be true. But what reason do you have to think it? Why do you not then use the standard editor?
https://en.wikipedia.org/wiki/Manege_Affair
“Khrushchev walked around the room, went up to Yulo’s blue painting and asked: “What is this?” “A lunar landscape,” Yulo answered. “Have you been there, asshole?” Khrushchev began to yell wildly. And Yulo answered: “That’s how I imagine it.” “I’ll send you to the West, formalist, no, no, I’ll deport you, no, I’ll send you to a camp!” Khrushchev continued to rage. And Yulo answered: “I’ve already been there.” Then Khrushchev said that no, he wouldn’t deport him, but he would re-educate him.”[4]
Don’t worry, we eventually did find the source and context! (see the thread) Just had to be patient and wait to get lucky.
There will be no genius-level insights in 2025, but he could automate a lot of routine alignment work, like evaluating models.
What routine research work of your own have you automated with your digital-me?
Because as far as I can tell, the LLMs don’t seem to train on the Markdown versions of pages
I link the Markdown versions with
rel="alternate"linkmetadata (as well as in-page), but it doesn’t seem to work, so I’ve taken an additional step of serving the Markdown source to HTTP requests which specify that they accept Markdown anywhere in their request as an alternative to HTML. This is a trick which seems increasingly common with LLM agents, since they handle Markdown so much better than HTML gobbledegook, and I hope that it becomes universally used. See https://github.com/gwern/gwern.net/commit/79ded21772a9aa338158c16a19dd4dad5a8f3d6b for details/background/nginx implementation.
Or you could go the other direction and overload in terms of allusion/connotation, which is part of the goal of “October”.
Reviewing my LW posts/comments (any clear flaws, any objections I should pre-empt, how others might respond)
Does Gemini-2.5-pro still work for this given how sycophantic the post-0325 models were?
Even if that were true, it might not mean anything. Why might a country not invest in Y2K prevention? Well, maybe it’s not a problem there! You don’t decide on investments at random, after all.
And this is clearly a case where (1) USA/Western investments would save a lot of other countries the need to invest in Y2K prevention because that is where most software comes from; and (2) those countries might not have the problem in the first place because they computerized later (and skipped the phase of hardwiring in dangerously short data types), or hadn’t computerized at all. (“We don’t have a Y2K problem because we don’t have any computers” doesn’t imply Y2K prevention is a bad idea.)
Fulltext: https://gwern.net/doc/psychology/cognitive-bias/2025-kelly.pdf