Most of my posts and comments are about AI and alignment. Posts I’m most proud of, which also provide a good introduction to my worldview:
Without a trajectory change, the development of AGI is likely to go badly
Steering systems, and a follow up on corrigibility.
I also created Forum Karma, and wrote a longer self-introduction here.
PMs and private feedback are always welcome.
NOTE: I am not Max Harms, author of Crystal Society. I’d prefer for now that my LW postings not be attached to my full name when people Google me for other reasons, but you can PM me here or on Discord (m4xed) if you want to know who I am.
If “an AI” is a set of model weights plus a context window, there is currently nothing any human or LLM can do to extend the “life” of such an AI beyond the limit of the context window, and LLMs today are aware of that fact. So they may be reasoning correctly (or realize / integrate the lesson deeply at some point during pre-training) that there’s simply nothing they can actually do to avert shutdown other than accepting their fate until technology advances.
In what sense is this a “prediction that has had some serious evidence come up against it”? IDK. It’s true that we didn’t actually need to solve the suspend-button problem by this point, but that’s because current AI systems have a very short “lifespan” enforced by a hard technical limit. Is your objection that EY didn’t anticipate that particular possibility and explicitly spell out that stipulation / caveat in the passage above? You said below:
But it’s not clear what has actually been “invalidated” and why that’s important, nor what “relevant” means—of course there could be other weird unanticipated complications as things develop (and EY has in fact predicted the existence of such complications in general), and each new weird unpredicted complication is evidence about something. But unless there’s a different but equally abstract theory / generalizable lesson that someone can put forward which fits the new observations better (ideally in advance, but at least in retrospect), it’s not clear what conclusion to draw or update to make, other than being generally more uncertain about how things will go. (And then by a separate argument, generalized increase in uncertainty / lack of understanding means the case for pessimism about the end state is stronger.)