Simulator Theory

TagLast edit: Dec 30, 2024, 9:49 AM by Dakara

Simulator Theory (in the context of AI) is an ontology or frame for understanding the working of large generative models, such as the GPT series from OpenAI. Broadly it views these models as simulating a learned distribution with various degrees of fidelity, which in the case of language models trained on a large corpus of text is the mechanics underlying our world.

It can also refer to an alignment research agenda, that deals with better understanding simulator conditionals, effects of downstream training, alignment-relevant properties such as myopia and agency in the context of language models, and using them as alignment research accelerators. See also: Cyborgism

Simulators

janusSep 2, 2022, 12:45 PM

632 points

168 comments41 min readLW link 8 reviews

(generative.ink)

Conditioning Predictive Models: Large language models as predictors

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 2, 2023, 8:28 PM

88 points

4 comments13 min readLW link

The Compleat Cybornaut

ukc10014, Jozdien and NicholasKees

May 19, 2023, 8:44 AM

66 points

2 comments16 min readLW link

Why Simulator AIs want to be Active Inference AIs

Jan_Kulveit and rosehadshar

Apr 10, 2023, 6:23 PM

94 points

9 comments8 min readLW link 1 review

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor

RogerDearnaleyJan 9, 2024, 8:42 PM

47 points

8 comments36 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM

64 points

30 comments11 min readLW link

The Waluigi Effect (mega-post)

Cleo NardoMar 3, 2023, 3:22 AM

628 points

188 comments16 min readLW link

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?

RogerDearnaleyJan 11, 2024, 12:56 PM

35 points

4 comments39 min readLW link

Simulacra are Things

janusJan 8, 2023, 11:03 PM

63 points

7 comments2 min readLW link

Conditioning Generative Models for Alignment

JozdienJul 18, 2022, 7:11 AM

60 points

8 comments20 min readLW link

‘simulator’ framing and confusions about LLMs

Beth BarnesDec 31, 2022, 11:38 PM

104 points

11 comments4 min readLW link

[Simulators seminar sequence] #1 Background & shared assumptions

Jan, Charlie Steiner, Logan Riggs, janus, jacquesthibs, metasemi, Michael Oesterle, Lucas Teixeira, peligrietzer and remember

Jan 2, 2023, 11:48 PM

50 points

4 comments3 min readLW link

Agents vs. Predictors: Concrete differentiating factors

evhubFeb 24, 2023, 11:50 PM

37 points

3 comments4 min readLW link

A smart enough LLM might be deadly simply if you run it for long enough

Mikhail SaminMay 5, 2023, 8:49 PM

19 points

16 comments8 min readLW link

RecurrentGPT: a loom-type tool with a twist

mishkaMay 25, 2023, 5:09 PM

10 points

0 comments3 min readLW link

(arxiv.org)

Conditioning Predictive Models: Deployment strategy

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 9, 2023, 8:59 PM

28 points

0 comments10 min readLW link

FAQ: What the heck is goal agnosticism?

porbyOct 8, 2023, 7:11 PM

66 points

38 comments28 min readLW link

Two problems with ‘Simulators’ as a frame

ryan_greenblattFeb 17, 2023, 11:34 PM

79 points

13 comments5 min readLW link

You’re not a simulation, ’cause you’re hallucinating

Stuart_ArmstrongFeb 21, 2023, 12:12 PM

25 points

6 comments1 min readLW link

[Question] Goals of model vs. goals of simulacra?

dr_sApr 12, 2023, 1:02 PM

5 points

7 comments1 min readLW link

Remarks 1–18 on GPT (compressed)

Cleo NardoMar 20, 2023, 10:27 PM

145 points

35 comments31 min readLW link

Implications of simulators

TW123Jan 7, 2023, 12:37 AM

17 points

0 comments12 min readLW link

One path to coherence: conditionalization

porbyJun 29, 2023, 1:08 AM

28 points

4 comments4 min readLW link

Using predictors in corrigible systems

porbyJul 19, 2023, 10:29 PM

19 points

6 comments27 min readLW link

GPTs are Predictors, not Imitators

Eliezer YudkowskyApr 8, 2023, 7:59 PM

416 points

100 comments3 min readLW link 3 reviews

Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?

Robert_AIZIMar 9, 2023, 5:28 PM

63 points

48 comments2 min readLW link

The algorithm isn’t doing X, it’s just doing Y.

Cleo NardoMar 16, 2023, 11:28 PM

53 points

43 comments5 min readLW link

Super-Luigi = Luigi + (Luigi—Waluigi)

AlexeiMar 17, 2023, 3:27 PM

16 points

9 comments1 min readLW link

Conditioning Predictive Models: Outer alignment via careful conditioning

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 2, 2023, 8:28 PM

72 points

15 comments57 min readLW link

Inner Misalignment in “Simulator” LLMs

Adam ScherlisJan 31, 2023, 8:33 AM

84 points

12 comments4 min readLW link

Research Report: Incorrectness Cascades (Corrected)

Robert_AIZIMay 9, 2023, 9:54 PM

9 points

0 comments9 min readLW link

(aizi.substack.com)

Notes on Antelligence

AurigenaMay 13, 2023, 6:38 PM

2 points

0 comments9 min readLW link

The (local) unit of intelligence is FLOPs

boazbarakJun 5, 2023, 6:23 PM

42 points

7 comments5 min readLW link

Philosophical Cyborg (Part 1)

ukc10014, Roman Leventov and NicholasKees

Jun 14, 2023, 4:20 PM

31 points

4 comments13 min readLW link

Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’

lukemarksJun 11, 2023, 12:13 AM

22 points

0 comments5 min readLW link

Philosophical Cyborg (Part 2)...or, The Good Successor

ukc10014Jun 21, 2023, 3:43 PM

21 points

1 comment31 min readLW link

Partial Simulation Extrapolation: A Proposal for Building Safer Simulators

lukemarksJun 17, 2023, 1:55 PM

16 points

0 comments10 min readLW link

Collective Identity

NicholasKees, ukc10014 and Garrett Baker

May 18, 2023, 9:00 AM

59 points

12 comments8 min readLW link

How I Learned To Stop Worrying And Love The Shoggoth

Peter MerelJul 12, 2023, 5:47 PM

9 points

15 comments5 min readLW link

Unsafe AI as Dynamical Systems

Robert_AIZIJul 14, 2023, 3:31 PM

11 points

0 comments3 min readLW link

(aizi.substack.com)

Memetic Judo #3: The Intelligence of Stochastic Parrots v.2

Max TKAug 20, 2023, 3:18 PM

8 points

33 comments6 min readLW link

The Löbian Obstacle, And Why You Should Care

lukemarksSep 7, 2023, 11:59 PM

18 points

6 comments2 min readLW link

The utility of humans within a Super Artificial Intelligence realm.

Marc MonroyOct 11, 2023, 5:30 PM

1 point

0 comments7 min readLW link

Revealing Intentionality In Language Models Through AdaVAE Guided Sampling

jdpOct 20, 2023, 7:32 AM

119 points

15 comments22 min readLW link

[ASoT] Finetuning, RL, and GPT’s world prior

JozdienDec 2, 2022, 4:33 PM

45 points

8 comments5 min readLW link

Conditioning Generative Models

Adam JermynJun 25, 2022, 10:15 PM

24 points

18 comments10 min readLW link

When can a mimic surprise you? Why generative models handle seemingly ill-posed problems

David JohnstonNov 5, 2022, 1:19 PM

8 points

4 comments16 min readLW link

AGI-level reasoner will appear sooner than an agent; what the humanity will do with this reasoner is critical

Roman LeventovJul 30, 2022, 8:56 PM

24 points

10 comments1 min readLW link

Simulators, constraints, and goal agnosticism: porbynotes vol. 1

porbyNov 23, 2022, 4:22 AM

37 points

2 comments35 min readLW link

Prosaic misalignment from the Solomonoff Predictor

Cleo NardoDec 9, 2022, 5:53 PM

42 points

3 comments5 min readLW link

Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy and Megan Kinniment

Dec 5, 2022, 8:28 PM

40 points

19 comments10 min readLW link

The Limit of Language Models

DragonGodJan 6, 2023, 11:53 PM

44 points

26 comments4 min readLW link

[Simulators seminar sequence] #2 Semiotic physics—revamped

Jan, Charlie Steiner, Logan Riggs, janus, jacquesthibs, metasemi, Michael Oesterle, Lucas Teixeira, peligrietzer and remember

Feb 27, 2023, 12:25 AM

24 points

23 comments13 min readLW link

[Question] Could Simulating an AGI Taking Over the World Actually Lead to a LLM Taking Over the World?

simeon_cJan 13, 2023, 6:33 AM

15 points

1 comment1 min readLW link

[ASoT] Simulators show us behavioural properties by default

JozdienJan 13, 2023, 6:42 PM

36 points

3 comments3 min readLW link

Underspecification of Oracle AI

Rubi J. Hudson, Adam Jermyn and Johannes Treutlein

Jan 15, 2023, 8:10 PM

30 points

12 comments19 min readLW link

Gradient Filtering

Jozdien and janus

Jan 18, 2023, 8:09 PM

56 points

16 comments13 min readLW link

Conditioning Predictive Models: The case for competitiveness

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 6, 2023, 8:08 PM

20 points

3 comments11 min readLW link

Conditioning Predictive Models: Making inner alignment as easy as possible

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 7, 2023, 8:04 PM

27 points

2 comments19 min readLW link

Conditioning Predictive Models: Interactions with other approaches

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

Feb 8, 2023, 6:19 PM

32 points

2 comments11 min readLW link

Cyborgism

NicholasKees and janus

Feb 10, 2023, 2:47 PM

341 points

46 comments35 min readLW link 2 reviews

A note on ‘semiotic physics’

metasemiFeb 11, 2023, 5:12 AM

11 points

13 comments6 min readLW link

Pretraining Language Models with Human Preferences

Tomek Korbak, Sam Bowman and Ethan Perez

Feb 21, 2023, 5:57 PM

135 points

20 comments11 min readLW link 2 reviews

Implied “utilities” of simulators are broad, dense, and shallow

porbyMar 1, 2023, 3:23 AM

45 points

7 comments3 min readLW link

Instrumentality makes agents agenty

porbyFeb 21, 2023, 4:28 AM

20 points

7 comments6 min readLW link

Situational awareness in Large Language Models

Simon MöllerMar 3, 2023, 6:59 PM

31 points

2 comments7 min readLW link

Emergent Misalignment and Emergent Alignment

Alvin ÅnestrandApr 3, 2025, 8:04 AM

5 points

0 comments8 min readLW link

On the future of language models

owencbDec 20, 2023, 4:58 PM

105 points

17 comments1 min readLW link

OpenAI Credit Account (2510$)

Emirhan BULUTJan 21, 2024, 2:32 AM

1 point

0 comments1 min readLW link

The case for more ambitious language model evals

JozdienJan 30, 2024, 12:01 AM

117 points

30 comments5 min readLW link

Interview with Robert Kralisch on Simulators

WillPetilloAug 26, 2024, 5:49 AM

17 points

0 comments75 min readLW link

Places of Loving Grace [Story]

ankFeb 18, 2025, 11:49 PM

−1 points

0 comments4 min readLW link

Language and Capabilities: Testing LLM Mathematical Abilities Across Languages

Ethan EdwardsApr 4, 2024, 1:18 PM

24 points

2 comments36 min readLW link

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

alamertonApr 18, 2024, 6:29 PM

25 points

4 comments16 min readLW link

How are Simulators and Agents related?

Robert KralischApr 29, 2024, 12:22 AM

6 points

0 comments7 min readLW link

IHSS: A Harmonic Field Simulator for Symbolic Feedback Learning

SynergyApr 9, 2025, 11:09 PM

1 point

0 comments1 min readLW link

Karpenchuk’s Theory: Human Life as a Simulation for Consciousness Development

Karpenchuk Bohdan Aug 2, 2024, 12:03 AM

1 point

0 comments2 min readLW link

Using ideologically-charged language to get gpt-3.5-turbo to disobey it’s system prompt: a demo

Milan WAug 24, 2024, 12:13 AM

3 points

0 comments6 min readLW link

The Trinity Architect Hypothesis (A fusion of The Trinity Paradox & The Architect’s Cycle)

kaninwithriceFeb 24, 2025, 4:40 AM

1 point

0 comments2 min readLW link

The Fractal Hypothesis: Are We Already in a Simulation?

QuanJan 9, 2025, 2:53 AM

1 point

0 comments3 min readLW link

Replicators, Gods and Buddhist Cosmology

KristianRonnJan 16, 2025, 10:51 AM

15 points

3 comments26 min readLW link

How To Prevent a Dystopia

ankJan 29, 2025, 2:16 PM

−3 points

4 comments1 min readLW link

Rational Effective Utopia & Narrow Way There: Multiversal AI Alignment, Place AI, New Ethicophysics… (Updated)

ankFeb 11, 2025, 3:21 AM

13 points

8 comments35 min readLW link

Early Results: Do LLMs complete false equations with false equations?

Robert_AIZIMar 30, 2023, 8:14 PM

14 points

0 comments4 min readLW link

(aizi.substack.com)

ICA Simulacra

OzyrusApr 5, 2023, 6:41 AM

26 points

2 comments7 min readLW link

Alignment of AutoGPT agents

OzyrusApr 12, 2023, 12:54 PM

14 points

1 comment4 min readLW link

Research Report: Incorrectness Cascades

Robert_AIZIApr 14, 2023, 12:49 PM

19 points

0 comments10 min readLW link

(aizi.substack.com)

I was Wrong, Simulator Theory is Real

Robert_AIZIApr 26, 2023, 5:45 PM

75 points

7 comments3 min readLW link

(aizi.substack.com)

[Question] Impressions from base-GPT-4?

mishkaNov 8, 2023, 5:43 AM

25 points

25 comments1 min readLW link

Is Interpretability All We Need?

RogerDearnaleyNov 14, 2023, 5:31 AM

1 point

1 comment1 min readLW link

Simulators Increase the Likelihood of Alignment by Default

Wuschel SchulzApr 30, 2023, 4:32 PM

13 points

1 comment5 min readLW link

No comments.

Si­mu­la­tor Theory

Simulator Theory