Nate Showell

Karma: 426

Nate Showell Jun 14, 2025, 5:31 AM
3 points
2
in reply to: Daniel Kokotajlo’s comment on: Distillation Robustifies Unlearning
Another experiment idea: testing whether the reduction in hallucinations that Yao et al. achieved with unlearning can be made robust.

Nate Showell May 4, 2025, 10:44 PM
3 points
0
on: What’s up with AI’s vision
Do LLMs perform better at games that are later in the Pokemon series? If difficulty interpreting pixel art is what’s holding them back, it would be less of a problem when playing later Pokemon games with higher-resolution sprites.

Nate Showell Apr 26, 2025, 12:21 AM
5 points
0
on: This prompt (sometimes) makes ChatGPT think about terrorist organisations
Have you tried seeing how ChatGPT responds to individual lines of code from that excerpt? There might be an anomalous token in it along the lines of ” petertodd”.

Nate Showell Apr 9, 2025, 6:20 AM
1 point
0
in reply to: Adam Zerner’s comment on: Against podcasts
Occasionally something will happen on the train that I want to hear, like the conductor announcing a delay. But not listening to podcasts on the train has more to do with not wanting to have earbuds in my ears or carry headphones around.

Nate Showell Apr 5, 2025, 9:10 PM
3 points
3
on: Against podcasts
I hardly ever listen to podcasts. Part of this is because I find earbuds very uncomfortable, but the bigger part is that they don’t fit into my daily routines very well. When I’m walking around or riding the train, I want to be able to hear what’s going on around me. When I do chores it’s usually in short segments where I don’t want to have to repeatedly pause and unpause a podcast when I stop and start. When I’m not doing any of those things, I can watch videos that have visual components instead of just audio, or can read interview transcripts in much less time than listening to a podcast would take. The podcast format doesn’t have any comparative advantage for me.

Nate Showell Apr 5, 2025, 4:22 AM
8 points
0
on: Nate Showell’s Shortform
Metroid Prime would work well as a difficult video-game-based test for AI generality.
- It has a mixture of puzzles, exploration, and action.
- It takes place in a 3D environment.
- It frequently involves backtracking across large portions of the map, so it requires planning ahead.
- There are various pieces of text you come across during the game. Some of them are descriptions of enemies’ weaknesses or clues on how to solve puzzles, but most of them are flavor text with no mechanical significance.
- The player occasionally unlocks new abilities they have to learn how to use.
- It requires the player to manage resources (health, missiles, power bombs)
- It’s on the difficult side for human players, but not to an extreme level.
There are no current AI systems that are anywhere close to being able to autonomously complete Metroid Prime. Such a system would probably have to be at or near the point where it could automate large portions of human labor.

Nate Showell Mar 22, 2025, 8:48 PM
3 points
−6
on: They Took MY Job?
I recently read This Is How You Lose the Time War, by Max Gladstone and Amal El-Mohtar, and had the strange experience of thinking “this sounds LLM-generated” even though it was written in 2019. Take this passage, for example:
You wrote of being in a village upthread together, living as friends and neighbors do, and I could have swallowed this valley whole and still not sated my hunger for the thought. Instead I wick the longing into thread, pass it through your needle eye, and sew it into hiding somewhere beneath my skin, embroider my next letter to you one stitch at a time.
I found that passage just by opening to a random page without having to cherry-pick. The whole book is like that. I’m not sure how I managed to stick it out and read the whole thing.
The short story on AI and grief feels very stylistically similar to This Is How You Lose the Time War. They both read like they’re cargo-culting some idea of what vivid prose is supposed to sound like. They overshoot the target of how many sensory details to include, while at the same time failing to cohere into anything more than a pile of mixed metaphors. The story on AI and grief is badly written, but its bad writing is of a type that human authors sometimes engage in too, even in novels like This Is How You Lose the Time War that sell well and become famous.
How soon do I think an LLM will write a novel I would go out of my way to read? As a back-of-the-envelope estimate, such an LLM is probably about as far away from current LLMs in novel-writing ability as current LLMs are from GPT-3. If I multiply the 5 years between GPT-3 and now by a factor of 1.5 to account for a slowdown in LLM capability improvements, I get an estimate of that LLM being 7.5 years away, so around late 2032.

Nate Showell Mar 21, 2025, 4:08 AM
2 points
0
on: Why White-Box Redteaming Makes Me Feel Weird
As you mentioned at the beginning of the post, popular culture contains examples of people being forced to say things they don’t want to say. Some of those examples end up in LLMs’ training data. Rather than involving consciousness or suffering on the part of the LLM, the behavior you’ve observed has a simpler explanation: the LLM is imitating characters in mind control stories that appear in its training corpus.

Nate Showell Mar 14, 2025, 6:34 AM
4 points
0
in reply to: cubefox’s comment on: Daniel Kokotajlo’s Shortform
There are sea slugs that photosynthesize, but that’s with chloroplasts they steal from the algae they eat.

Nate Showell Mar 9, 2025, 8:33 PM
2 points
0
in reply to: Carl Feynman’s comment on: What is the best / most proper definition of “Feeling the AGI” there is?
As I use the term, the presence or absence of an emotional reaction isn’t what determines whether someone is “feeling the AGI” or not. I use it to mean basing one’s AI timeline predictions on a feeling.

Nate Showell Mar 8, 2025, 10:26 PM
2 points
0
on: What is the best / most proper definition of “Feeling the AGI” there is?
Getting caught up in an information cascade that says AGI is arriving soon. A person who’s “feeling the AGI” has “vibes-based” reasons for their short timelines due to copying what the people around them believe. In contrast, a person who looks carefully at the available evidence and formulates a gears-level model of AI timelines is doing something different than “feeling the AGI,” even if their timelines are short. “Feeling” is the crucial word here.

Nate Showell Mar 2, 2025, 8:13 PM
14 points
1
on: Share AI Safety Ideas: Both Crazy and Not
The phenomenon of LLMs converging on mystical-sounding outputs deserves more exploration. There might be something alignment-relevant happening to LLMs’ self-models/world-models when they enter the mystical mode, potentially related to self-other overlap or to a similar ontology in which the concepts of “self” and “other” aren’t used. I would like to see an interpretability project analyzing the properties of LLMs that are in the mystical mode.

Nate Showell Feb 15, 2025, 10:04 PM
8 points
0
in reply to: tailcalled’s comment on: tailcalled’s Shortform
The question of population ethics can be dissolved by rejecting personal identity realism. And we already have good reasons to reject personal identity realism, or at least consider it suspect, due to the paradoxes that arise in split-brain thought experiments (e.g., the hemisphere swap thought experiment) if you assume there’s a single correct way to assign personal identity.

Nate Showell Feb 14, 2025, 2:04 AM
4 points
0
on: My model of what is going on with LLMs
LLMs are more accurately described as artificial culture instead of artificial intelligence. They’ve been able to achieve the things they’ve achieved by replicating the secret of our success, and by engaging in much more extensive cultural accumulation (at least in terms of text-based cultural artifacts) than any human ever could. But cultural knowledge isn’t the same thing as intelligence, hence LLMs’ continued difficulties with sequential reasoning and planning.

Nate Showell Feb 8, 2025, 3:59 AM
4 points
−4
in reply to: Morphism’s comment on: Pi Rogers’s Shortform
On the contrary, convex agents are wildly abundant—we call them r-selected organisms.

Nate Showell Jan 10, 2025, 6:03 AM
3 points
0
on: Rebuttals for ~all criticisms of AIXI
The uncomputability of AIXI is a bigger problem than this post makes it out to be. This uncomputability inserts a contradiction into any proof that relies on AIXI—the same contradiction as in Goedel’s Theorem. You can get around this contradiction instead by using approximations of AIXI, but the resulting proofs will be specific to those approximations, and you would need to prove additional theorems to transfer results between the approximations.

Nate Showell Dec 28, 2024, 10:28 PM
4 points
0
in reply to: habryka’s comment on: The Field of AI Alignment: A Postmortem, and What To Do About It
Some concrete predictions:
- The behavior of the ASI will be a collection of heuristics that are activated in different contexts.
- The ASI’s software will not have any component that can be singled out as the utility function, although it may have a component that sets a reinforcement schedule.
- The ASI will not wirehead.
- The ASI’s world-model won’t have a single unambiguous self-versus-world boundary. The situational awareness of the ASI will have more in common with that of an advanced meditator than it does with that of an idealized game-theoretic agent.

Nate Showell Dec 26, 2024, 11:20 PM
21 points
−11
on: The Field of AI Alignment: A Postmortem, and What To Do About It
My view of the development of the field of AI alignment is pretty much the exact opposite of yours: theoretical agent foundations research, what you describe as research on the hard parts of the alignment problem, is a castle in the clouds. Only when alignment researchers started experimenting with real-world machine learning models did AI alignment become grounded in reality. The biggest epistemic failure in the history of the AI alignment community was waiting too long to make this transition.
Early arguments for the possibility of AI existential risk (as seen, for example, in the Sequences) were largely based on 1) rough analogies, especially to evolution, and 2) simplifying assumptions about the structure and properties of AGI. For example, agent foundations research sometimes assumes that AGI has infinite compute or that it has a strict boundary between its internal decision processes and the outside world.
As neural networks started to see increasing success at a wide variety of problems in the mid-2010s, it started to become apparent that the analogies and assumptions behind early AI x-risk cases didn’t apply to them. The process of developing an ML model isn’t very similar to evolution. Neural networks use finite amounts of compute, have internals that can be probed and manipulated, and behave in ways that can’t be rounded off to decision theory. On top of that, it became increasingly clear as the deep learning revolution progressed that even if agent foundations research did deliver accurate theoretical results, there was no way to put them into practice.
But many AI alignment researchers stuck with the agent foundations approach for a long time after their predictions about the structure and behavior of AI failed to come true. Indeed, the late-2000s AI x-risk arguments still get repeated sometimes, like in List of Lethalities. It’s telling that the OP uses worst-case ELK as an example of one of the hard parts of the alignment problem; the framing of the worst-case ELK problem doesn’t make any attempt to ground the problem in the properties of any AI system that could plausibly exist in the real world, and instead explicitly rejects any such grounding as not being truly worst-case.
Why have ungrounded agent foundations assumptions stuck around for so long? There are a couple factors that are likely at work:
- Agent foundations nerd-snipes people. Theoretical agent foundations is fun to speculate about, especially for newcomers or casual followers of the field, in a way that experimental AI alignment isn’t. There’s much more drudgery involved in running an experiment. This is why I, personally, took longer than I should have to abandon the agent foundations approach.
- Game-theoretic arguments are what motivated many researchers to take the AI alignment problem seriously in the first place. The sunk cost fallacy then comes into play: if you stop believing that game-theoretic arguments for AI x-risk are accurate, you might conclude that all the time you spent researching AI alignment was wasted.
Rather than being an instance of the streetlight effect, the shift to experimental research on AI alignment was an appropriate response to developments in the field of AI as it left the GOFAI era. AI alignment research is now much more grounded in the real world than it was in the early 2010s.

Nate Showell Dec 1, 2024, 8:44 PM
1 point
−7
on: Why does ChatGPT throw an error when outputting “David Mayer”?
This looks like it’s related to the phenomenon of glitch tokens:
https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology
https://www.lesswrong.com/posts/f4vmcJo226LP7ggmr/glitch-token-catalog-almost-a-full-clear
ChatGPT no longer uses the same tokenizer that it used when the SolidGoldMagikarp phenomenon was discovered, but its new tokenizer could be exhibiting similar behavior.

Nate Showell Nov 29, 2024, 8:51 PM
3 points
0
on: Is the mind a program?
Another piece of evidence against practical CF is that, under some conditions, the human visual system is capable of seeing individual photons. This finding demonstrates that in at least some cases, the molecular-scale details of the nervous system are relevant to the contents of conscious experience.