RSS

Si­mu­la­tor Theory

TagLast edit: 14 Feb 2023 17:08 UTC by Jozdien

Simulator theory in the context of AI refers to an ontology or frame for understanding the working of large generative models, such as the GPT series from OpenAI. Broadly it views these models as simulating a learned distribution with various degrees of fidelity, which in the case of language models trained on a large corpus of text is the mechanics underlying our world.

It can also refer to an alignment research agenda, that deals with better understanding simulator conditionals, effects of downstream training, alignment-relevant properties such as myopia and agency in the context of language models, and using them as alignment research accelerators. See also: Cyborgism

Simulators

janus2 Sep 2022 12:45 UTC
612 points
162 comments41 min readLW link8 reviews
(generative.ink)

Con­di­tion­ing Pre­dic­tive Models: Large lan­guage mod­els as predictors

2 Feb 2023 20:28 UTC
88 points
4 comments13 min readLW link

The Com­pleat Cybornaut

19 May 2023 8:44 UTC
65 points
2 comments16 min readLW link

Why Si­mu­la­tor AIs want to be Ac­tive In­fer­ence AIs

10 Apr 2023 18:23 UTC
91 points
9 comments8 min readLW link1 review

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
34 points
4 comments39 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC
64 points
30 comments11 min readLW link

Good­bye, Shog­goth: The Stage, its An­i­ma­tron­ics, & the Pup­peteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC
47 points
8 comments36 min readLW link

The Waluigi Effect (mega-post)

Cleo Nardo3 Mar 2023 3:22 UTC
630 points
187 comments16 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
59 points
8 comments20 min readLW link

Si­mu­lacra are Things

janus8 Jan 2023 23:03 UTC
63 points
7 comments2 min readLW link

‘simu­la­tor’ fram­ing and con­fu­sions about LLMs

Beth Barnes31 Dec 2022 23:38 UTC
104 points
11 comments4 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #1 Back­ground & shared assumptions

2 Jan 2023 23:48 UTC
50 points
4 comments3 min readLW link

A smart enough LLM might be deadly sim­ply if you run it for long enough

Mikhail Samin5 May 2023 20:49 UTC
17 points
16 comments8 min readLW link

Agents vs. Pre­dic­tors: Con­crete differ­en­ti­at­ing factors

evhub24 Feb 2023 23:50 UTC
37 points
3 comments4 min readLW link

Su­per-Luigi = Luigi + (Luigi—Waluigi)

Alexei17 Mar 2023 15:27 UTC
16 points
9 comments1 min readLW link

Us­ing pre­dic­tors in cor­rigible systems

porby19 Jul 2023 22:29 UTC
19 points
6 comments27 min readLW link

Re­marks 1–18 on GPT (com­pressed)

Cleo Nardo20 Mar 2023 22:27 UTC
148 points
35 comments31 min readLW link

Im­pli­ca­tions of simulators

TW1237 Jan 2023 0:37 UTC
17 points
0 comments12 min readLW link

FAQ: What the heck is goal ag­nos­ti­cism?

porby8 Oct 2023 19:11 UTC
66 points
36 comments28 min readLW link

Re­cur­ren­tGPT: a loom-type tool with a twist

mishka25 May 2023 17:09 UTC
10 points
0 comments3 min readLW link
(arxiv.org)

In­ner Misal­ign­ment in “Si­mu­la­tor” LLMs

Adam Scherlis31 Jan 2023 8:33 UTC
84 points
12 comments4 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Outer al­ign­ment via care­ful conditioning

2 Feb 2023 20:28 UTC
72 points
15 comments57 min readLW link

Con­di­tion­ing Pre­dic­tive Models: De­ploy­ment strategy

9 Feb 2023 20:59 UTC
28 points
0 comments10 min readLW link

Two prob­lems with ‘Si­mu­la­tors’ as a frame

ryan_greenblatt17 Feb 2023 23:34 UTC
81 points
13 comments5 min readLW link

You’re not a simu­la­tion, ’cause you’re hallucinating

Stuart_Armstrong21 Feb 2023 12:12 UTC
25 points
6 comments1 min readLW link

GPTs are Pre­dic­tors, not Imitators

Eliezer Yudkowsky8 Apr 2023 19:59 UTC
409 points
99 comments3 min readLW link3 reviews

One path to co­her­ence: con­di­tion­al­iza­tion

porby29 Jun 2023 1:08 UTC
28 points
4 comments4 min readLW link

[Question] Goals of model vs. goals of simu­lacra?

dr_s12 Apr 2023 13:02 UTC
5 points
7 comments1 min readLW link

Why do we as­sume there is a “real” shog­goth be­hind the LLM? Why not masks all the way down?

Robert_AIZI9 Mar 2023 17:28 UTC
63 points
48 comments2 min readLW link

The al­gorithm isn’t do­ing X, it’s just do­ing Y.

Cleo Nardo16 Mar 2023 23:28 UTC
53 points
43 comments5 min readLW link

Col­lec­tive Identity

18 May 2023 9:00 UTC
59 points
12 comments8 min readLW link

How I Learned To Stop Wor­ry­ing And Love The Shoggoth

Peter Merel12 Jul 2023 17:47 UTC
9 points
15 comments5 min readLW link

Un­safe AI as Dy­nam­i­cal Systems

Robert_AIZI14 Jul 2023 15:31 UTC
11 points
0 comments3 min readLW link
(aizi.substack.com)

Memetic Judo #3: The In­tel­li­gence of Stochas­tic Par­rots v.2

Max TK20 Aug 2023 15:18 UTC
8 points
33 comments6 min readLW link

The Löbian Ob­sta­cle, And Why You Should Care

lukemarks7 Sep 2023 23:59 UTC
18 points
6 comments2 min readLW link

The util­ity of hu­mans within a Su­per Ar­tifi­cial In­tel­li­gence realm.

Marc Monroy11 Oct 2023 17:30 UTC
1 point
0 comments7 min readLW link

Re­veal­ing In­ten­tion­al­ity In Lan­guage Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC
119 points
15 comments22 min readLW link

[ASoT] Fine­tun­ing, RL, and GPT’s world prior

Jozdien2 Dec 2022 16:33 UTC
44 points
8 comments5 min readLW link

Con­di­tion­ing Gen­er­a­tive Models

Adam Jermyn25 Jun 2022 22:15 UTC
24 points
18 comments10 min readLW link

When can a mimic sur­prise you? Why gen­er­a­tive mod­els han­dle seem­ingly ill-posed problems

David Johnston5 Nov 2022 13:19 UTC
8 points
4 comments16 min readLW link

AGI-level rea­soner will ap­pear sooner than an agent; what the hu­man­ity will do with this rea­soner is critical

Roman Leventov30 Jul 2022 20:56 UTC
24 points
10 comments1 min readLW link

Si­mu­la­tors, con­straints, and goal ag­nos­ti­cism: por­bynotes vol. 1

porby23 Nov 2022 4:22 UTC
37 points
2 comments35 min readLW link

Pro­saic mis­al­ign­ment from the Solomonoff Predictor

Cleo Nardo9 Dec 2022 17:53 UTC
42 points
3 comments5 min readLW link

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

5 Dec 2022 20:28 UTC
40 points
19 comments10 min readLW link

The Limit of Lan­guage Models

DragonGod6 Jan 2023 23:53 UTC
44 points
26 comments4 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #2 Semiotic physics—revamped

27 Feb 2023 0:25 UTC
24 points
23 comments13 min readLW link

[Question] Could Si­mu­lat­ing an AGI Tak­ing Over the World Ac­tu­ally Lead to a LLM Tak­ing Over the World?

simeon_c13 Jan 2023 6:33 UTC
15 points
1 comment1 min readLW link

[ASoT] Si­mu­la­tors show us be­havi­oural prop­er­ties by default

Jozdien13 Jan 2023 18:42 UTC
35 points
3 comments3 min readLW link

Un­der­speci­fi­ca­tion of Or­a­cle AI

15 Jan 2023 20:10 UTC
30 points
12 comments19 min readLW link

Gra­di­ent Filtering

18 Jan 2023 20:09 UTC
54 points
16 comments13 min readLW link

Con­di­tion­ing Pre­dic­tive Models: The case for competitiveness

6 Feb 2023 20:08 UTC
20 points
3 comments11 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Mak­ing in­ner al­ign­ment as easy as possible

7 Feb 2023 20:04 UTC
27 points
2 comments19 min readLW link

Con­di­tion­ing Pre­dic­tive Models: In­ter­ac­tions with other approaches

8 Feb 2023 18:19 UTC
32 points
2 comments11 min readLW link

Cyborgism

10 Feb 2023 14:47 UTC
337 points
46 comments35 min readLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
134 points
19 comments11 min readLW link

Im­plied “util­ities” of simu­la­tors are broad, dense, and shallow

porby1 Mar 2023 3:23 UTC
45 points
7 comments3 min readLW link

In­stru­men­tal­ity makes agents agenty

porby21 Feb 2023 4:28 UTC
20 points
4 comments6 min readLW link

Si­tu­a­tional aware­ness in Large Lan­guage Models

Simon Möller3 Mar 2023 18:59 UTC
30 points
2 comments7 min readLW link

A note on ‘semiotic physics’

metasemi11 Feb 2023 5:12 UTC
11 points
13 comments6 min readLW link

On the fu­ture of lan­guage models

owencb20 Dec 2023 16:58 UTC
105 points
17 comments1 min readLW link

OpenAI Credit Ac­count (2510$)

Emirhan BULUT21 Jan 2024 2:32 UTC
1 point
0 comments1 min readLW link

The case for more am­bi­tious lan­guage model evals

Jozdien30 Jan 2024 0:01 UTC
110 points
30 comments5 min readLW link

In­ter­view with Robert Kral­isch on Simulators

WillPetillo26 Aug 2024 5:49 UTC
17 points
0 comments75 min readLW link

Lan­guage and Ca­pa­bil­ities: Test­ing LLM Math­e­mat­i­cal Abil­ities Across Languages

Ethan Edwards4 Apr 2024 13:18 UTC
24 points
2 comments36 min readLW link

A Re­view of In-Con­text Learn­ing Hy­pothe­ses for Au­to­mated AI Align­ment Research

alamerton18 Apr 2024 18:29 UTC
25 points
4 comments16 min readLW link

How are Si­mu­la­tors and Agents re­lated?

Robert Kralisch29 Apr 2024 0:22 UTC
6 points
0 comments7 min readLW link

Karpenchuk’s The­ory: Hu­man Life as a Si­mu­la­tion for Con­scious­ness Development

Karpenchuk Bohdan 2 Aug 2024 0:03 UTC
1 point
0 comments2 min readLW link

Us­ing ide­olog­i­cally-charged lan­guage to get gpt-3.5-turbo to di­s­obey it’s sys­tem prompt: a demo

Milan W24 Aug 2024 0:13 UTC
3 points
0 comments6 min readLW link

Early Re­sults: Do LLMs com­plete false equa­tions with false equa­tions?

Robert_AIZI30 Mar 2023 20:14 UTC
14 points
0 comments4 min readLW link
(aizi.substack.com)

ICA Simulacra

Ozyrus5 Apr 2023 6:41 UTC
26 points
2 comments7 min readLW link

Align­ment of Au­toGPT agents

Ozyrus12 Apr 2023 12:54 UTC
14 points
1 comment4 min readLW link

Re­search Re­port: In­cor­rect­ness Cascades

Robert_AIZI14 Apr 2023 12:49 UTC
19 points
0 comments10 min readLW link
(aizi.substack.com)

I was Wrong, Si­mu­la­tor The­ory is Real

Robert_AIZI26 Apr 2023 17:45 UTC
75 points
7 comments3 min readLW link
(aizi.substack.com)

[Question] Im­pres­sions from base-GPT-4?

mishka8 Nov 2023 5:43 UTC
25 points
25 comments1 min readLW link

Is In­ter­pretabil­ity All We Need?

RogerDearnaley14 Nov 2023 5:31 UTC
1 point
1 comment1 min readLW link

Si­mu­la­tors In­crease the Like­li­hood of Align­ment by Default

Wuschel Schulz30 Apr 2023 16:32 UTC
13 points
1 comment5 min readLW link

Re­search Re­port: In­cor­rect­ness Cas­cades (Cor­rected)

Robert_AIZI9 May 2023 21:54 UTC
9 points
0 comments9 min readLW link
(aizi.substack.com)

Notes on Antelligence

Aurigena13 May 2023 18:38 UTC
2 points
0 comments9 min readLW link

The (lo­cal) unit of in­tel­li­gence is FLOPs

boazbarak5 Jun 2023 18:23 UTC
42 points
7 comments5 min readLW link

Philo­soph­i­cal Cy­borg (Part 1)

14 Jun 2023 16:20 UTC
31 points
4 comments13 min readLW link

Higher Di­men­sion Carte­sian Ob­jects and Align­ing ‘Tiling Si­mu­la­tors’

lukemarks11 Jun 2023 0:13 UTC
22 points
0 comments5 min readLW link

Philo­soph­i­cal Cy­borg (Part 2)...or, The Good Successor

ukc1001421 Jun 2023 15:43 UTC
21 points
1 comment31 min readLW link

Par­tial Si­mu­la­tion Ex­trap­o­la­tion: A Pro­posal for Build­ing Safer Simulators

lukemarks17 Jun 2023 13:55 UTC
16 points
0 comments10 min readLW link
No comments.