Deception

TagLast edit: Feb 8, 2023, 2:23 PM by Roman Leventov

Deception is the act of sharing information in a way which intentionally misleads others.

Related Pages: Deceptive Alignment, Honesty, Meta-Honesty, Self-Deception, Simulacrum Levels

Maybe Lying Can’t Exist?!

Zack_M_DavisAug 23, 2020, 12:36 AM

58 points

16 comments5 min readLW link

Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles

Zack_M_DavisMar 2, 2024, 10:05 PM

29 points

25 comments58 min readLW link

(unremediatedgender.space)

Algorithms of Deception!

Zack_M_DavisOct 19, 2019, 6:04 PM

24 points

7 comments5 min readLW link

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein and Peter S. Park

Aug 29, 2023, 1:29 AM

54 points

3 comments10 min readLW link

Interpreting the Learning of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM

30 points

14 comments9 min readLW link

Conflict Theory of Bounded Distrust

Zack_M_DavisFeb 12, 2023, 5:30 AM

112 points

33 comments3 min readLW link 1 review

Deconfusing Deception

J BostockJan 29, 2022, 4:43 PM

28 points

6 comments2 min readLW link

Deep Deceptiveness

So8resMar 21, 2023, 2:51 AM

251 points

60 comments14 min readLW link 1 review

LCDT, A Myopic Decision Theory

adamShimi and evhub

Aug 3, 2021, 10:41 PM

57 points

50 comments15 min readLW link

On hiding the source of knowledge

jessicataJan 26, 2020, 2:48 AM

120 points

40 comments3 min readLW link

(unstableontology.com)

[Book Review] “Houdini on Magic” by Harry Houdini

lsusrSep 29, 2021, 2:37 AM

21 points

1 comment6 min readLW link

Comment on “Deception as Cooperation”

Zack_M_DavisNov 27, 2021, 4:04 AM

23 points

4 comments7 min readLW link

On Bounded Distrust

ZviFeb 3, 2022, 2:50 PM

137 points

19 comments56 min readLW link 1 review

(thezvi.wordpress.com)

[Question] Everyone’s mired in the deepest confusion, some of the time?

M. Y. ZuoFeb 9, 2022, 2:53 AM

1 point

2 comments1 min readLW link

The Speed + Simplicity Prior is probably anti-deceptive

Yonadav ShavitApr 27, 2022, 7:30 PM

30 points

28 comments12 min readLW link

Deep Honesty

AletheophileMay 7, 2024, 8:31 PM

158 points

25 comments9 min readLW link

If Clarity Seems Like Death to Them

Zack_M_DavisDec 30, 2023, 5:40 PM

47 points

192 comments87 min readLW link 1 review

(unremediatedgender.space)

Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think

Zack_M_DavisDec 27, 2019, 5:09 AM

128 points

43 comments8 min readLW link 2 reviews

Optimized Propaganda with Bayesian Networks: Comment on “Articulating Lay Theories Through Graphical Models”

Zack_M_DavisJun 29, 2020, 2:45 AM

105 points

10 comments4 min readLW link

Maybe Lying Doesn’t Exist

Zack_M_DavisOct 14, 2019, 7:04 AM

71 points

59 comments8 min readLW link

“Rationalizing” and “Sitting Bolt Upright in Alarm.”

RaemonJul 8, 2019, 8:34 PM

45 points

56 comments4 min readLW link

Can crimes be discussed literally?

BenquoMar 22, 2020, 8:17 PM

102 points

38 comments2 min readLW link 3 reviews

(benjaminrosshoffman.com)

Precursor checking for deceptive alignment

evhubAug 3, 2022, 10:56 PM

24 points

0 comments14 min readLW link

Don’t Double-Crux With Suicide Rock

Zack_M_DavisJan 1, 2020, 7:02 PM

91 points

30 comments2 min readLW link

Superintelligence 11: The treacherous turn

KatjaGraceNov 25, 2014, 2:00 AM

16 points

50 comments6 min readLW link

How likely is deceptive alignment?

evhubAug 30, 2022, 7:34 PM

104 points

28 comments60 min readLW link

Lying Alignment Chart

Zack_M_DavisNov 29, 2023, 4:15 PM

77 points

17 comments1 min readLW link

“On Bullshit” and “On Truth,” by Harry Frankfurt

CallmesalticidaeAug 28, 2020, 12:44 AM

20 points

3 comments6 min readLW link

When Someone Tells You They’re Lying, Believe Them

ymeskhoutJul 14, 2023, 12:31 AM

95 points

3 comments3 min readLW link

How to Corner Liars: A Miasma-Clearing Protocol

ymeskhoutFeb 27, 2025, 5:18 PM

60 points

23 comments7 min readLW link

(www.ymeskhout.com)

Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research

evhub, Nicholas Schiefer, Carson Denison and Ethan Perez

Aug 8, 2023, 1:30 AM

318 points

30 comments18 min readLW link 1 review

Contract Fraud

jefftkMar 1, 2023, 3:10 AM

86 points

10 comments1 min readLW link

(www.jefftk.com)

[Question] Why do so many think deception in AI is important?

PrometheusJan 13, 2024, 8:14 AM

24 points

12 comments1 min readLW link

Notes on Honesty

David GrossOct 28, 2020, 12:54 AM

46 points

6 comments20 min readLW link

Notes on Sincerity and such

David GrossDec 1, 2020, 5:09 AM

9 points

2 comments10 min readLW link

“Rationalist Discourse” Is Like “Physicist Motors”

Zack_M_DavisFeb 26, 2023, 5:58 AM

136 points

153 comments9 min readLW link 1 review

Difficulty classes for alignment properties

JozdienFeb 20, 2024, 9:08 AM

34 points

5 comments2 min readLW link

[Question] Why is o1 so deceptive?

abramdemskiSep 27, 2024, 5:27 PM

180 points

24 comments3 min readLW link

SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research

Roman LeventovDec 19, 2023, 4:49 PM

17 points

5 comments3 min readLW link

Lying is Cowardice, not Strategy

Connor Leahy and Gabriel Alfour

Oct 24, 2023, 1:24 PM

29 points

73 comments5 min readLW link

(cognition.cafe)

Less Wrong Poetry Corner: Coventry Patmore’s “Magna Est Veritas”

Zack_M_DavisJan 30, 2021, 5:16 AM

15 points

1 comment1 min readLW link

Overconfidence is Deceit

Duncan Sabien (Deactivated)Feb 17, 2021, 10:45 AM

78 points

29 comments11 min readLW link

Unnatural Categories Are Optimized for Deception

Zack_M_DavisJan 8, 2021, 8:54 PM

89 points

29 comments33 min readLW link 1 review

Communication Requires Common Interests or Differential Signal Costs

Zack_M_DavisMar 26, 2021, 6:41 AM

40 points

13 comments3 min readLW link 1 review

Universality Unwrapped

adamShimiAug 21, 2020, 6:53 PM

29 points

2 comments18 min readLW link

Functional silence: communication that minimizes change of receiver’s beliefs

chaosmageFeb 12, 2019, 9:32 PM

27 points

5 comments2 min readLW link

Of Lies and Black Swan Blowups

Eliezer YudkowskyApr 7, 2009, 6:26 PM

31 points

8 comments1 min readLW link

[Linkpost] Deception Abilities Emerged in Large Language Models

Bogdan Ionut CirsteaAug 3, 2023, 5:28 PM

12 points

0 comments1 min readLW link

Corrigibility thoughts III: manipulating versus deceiving

Stuart_ArmstrongJan 18, 2017, 3:57 PM

3 points

0 comments1 min readLW link

Blatant lies are the best kind!

BenquoJul 3, 2019, 8:45 PM

26 points

17 comments5 min readLW link

(benjaminrosshoffman.com)

[LINK] EA Has A Lying Problem

BenquoJan 11, 2017, 10:31 PM

28 points

34 comments1 min readLW link

(srconstantin.wordpress.com)

Matching donation fundraisers can be harmfully dishonest.

BenquoNov 11, 2016, 9:05 PM

18 points

6 comments14 min readLW link

Misleading the witness

Bo102010Aug 9, 2009, 8:13 PM

16 points

116 comments2 min readLW link

The Santa deception: how did it affect you?

DesrtopaDec 20, 2010, 10:27 PM

30 points

204 comments1 min readLW link

How to solve deception and still fail.

Charlie SteinerOct 4, 2023, 7:56 PM

40 points

7 comments6 min readLW link

Thoughts On (Solving) Deep Deception

JozdienOct 21, 2023, 10:40 PM

72 points

6 comments6 min readLW link

Why I’m Worried About AI

peterbarnettMay 23, 2022, 9:13 PM

22 points

2 comments12 min readLW link

Training Trace Priors

Adam JermynJun 13, 2022, 2:22 PM

12 points

17 comments4 min readLW link

Multigate Priors

Adam JermynJun 15, 2022, 7:30 PM

4 points

0 comments3 min readLW link

Conditioning Generative Models

Adam JermynJun 25, 2022, 10:15 PM

24 points

18 comments10 min readLW link

Formalizing Deception

JamesHJun 26, 2022, 5:39 PM

14 points

2 comments5 min readLW link

Training Trace Priors and Speed Priors

Adam JermynJun 26, 2022, 6:07 PM

17 points

0 comments3 min readLW link

Latent Adversarial Training

Adam JermynJun 29, 2022, 8:04 PM

52 points

13 comments5 min readLW link

Modelling Deception

Garrett BakerJul 18, 2022, 9:21 PM

15 points

0 comments7 min readLW link

Deception as the optimal: mesa-optimizers and inner alignment

Eleni AngelouAug 16, 2022, 4:49 AM

11 points

0 comments5 min readLW link

Three scenarios of pseudo-alignment

Eleni AngelouSep 3, 2022, 12:47 PM

9 points

0 comments3 min readLW link

Monitoring for deceptive alignment

evhubSep 8, 2022, 11:07 PM

135 points

8 comments9 min readLW link

Getting up to Speed on the Speed Prior in 2022

robertzkDec 28, 2022, 7:49 AM

36 points

5 comments65 min readLW link

The commercial incentive to intentionally train AI to deceive us

Derek M. JonesDec 29, 2022, 11:30 AM

5 points

1 comment4 min readLW link

(shape-of-code.com)

Deceptive failures short of full catastrophe.

Alex Lawsen Jan 15, 2023, 7:28 PM

33 points

5 comments9 min readLW link

EIS VIII: An Engineer’s Understanding of Deceptive Alignment

scasperFeb 19, 2023, 3:25 PM

30 points

5 comments4 min readLW link

Discussion: Challenges with Unsupervised LLM Knowledge Discovery

Seb Farquhar, Vikrant Varma, zac_kenton, gasteigerjo, Vlad Mikulik and Rohin Shah

Dec 18, 2023, 11:58 AM

147 points

21 comments10 min readLW link

Deception Chess

Chris LandJan 1, 2024, 3:40 PM

7 points

2 comments4 min readLW link

(Partial) failure in replicating deceptive alignment experiment

claudia.biancottiJan 7, 2024, 5:56 PM

1 point

0 comments1 min readLW link

Finding Deception in Language Models

Esben Kran and Archana Vaidheeswaran

Aug 20, 2024, 9:42 AM

18 points

4 comments4 min readLW link

My Clients, The Liars

ymeskhoutMar 5, 2024, 9:06 PM

247 points

86 comments7 min readLW link

LLMs can strategically deceive while doing gain-of-function research

Igor IvanovJan 24, 2024, 3:45 PM

33 points

4 comments11 min readLW link

‘Empiricism!’ as Anti-Epistemology

Eliezer YudkowskyMar 14, 2024, 2:02 AM

171 points

92 comments25 min readLW link

Among Us: A Sandbox for Agentic Deception

7vik and Adrià Garriga-alonso

Apr 5, 2025, 6:24 AM

103 points

4 comments7 min readLW link

Inducing Unprompted Misalignment in LLMs

Sam Svenningsen, evhub and Henry Sleight

Apr 19, 2024, 8:00 PM

38 points

7 comments16 min readLW link

Sparse Features Through Time

Rogan InglisJun 24, 2024, 6:06 PM

12 points

1 comment1 min readLW link

(roganinglis.io)

[Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF

Leon LangOct 22, 2024, 1:57 PM

51 points

2 comments18 min readLW link

(arxiv.org)

Secret Collusion: Will We Know When to Unplug AI?

schroederdewitt, srm, MikhailB, Lewis Hammond, chansmi and sofmonk

Sep 16, 2024, 4:07 PM

56 points

7 comments31 min readLW link

Ethical Deception: Should AI Ever Lie?

Jason ReidAug 2, 2024, 5:53 PM

5 points

2 comments7 min readLW link

Let’s use AI to harden human defenses against AI manipulation

Tom DavidsonMay 17, 2023, 11:33 PM

35 points

7 comments24 min readLW link

On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback

Marcus Williams, micahcarroll, Adhyyan Narang, Constantin Weisser and Brendan Murphy

Nov 7, 2024, 3:39 PM

51 points

7 comments11 min readLW link

On Intentionality, or: Towards a More Inclusive Concept of Lying

Cornelius DybdahlOct 18, 2024, 10:37 AM

8 points

0 comments4 min readLW link

Cautions about LLMs in Human Cognitive Loops

Alice BlairMar 2, 2025, 7:53 PM

38 points

9 comments7 min readLW link

The present perfect tense is ruining your life

PatrickDFarleyJan 27, 2025, 4:14 PM

24 points

14 comments8 min readLW link

AI x-risk, approximately ordered by embarrassment

Alex Lawsen Apr 12, 2023, 11:01 PM

151 points

7 comments19 min readLW link

Research Report: Incorrectness Cascades

Robert_AIZIApr 14, 2023, 12:49 PM

19 points

0 comments10 min readLW link

(aizi.substack.com)

Deception Strategies

Thoth HermesApr 20, 2023, 3:59 PM

−7 points

2 comments5 min readLW link

(thothhermes.substack.com)

I was Wrong, Simulator Theory is Real

Robert_AIZIApr 26, 2023, 5:45 PM

75 points

7 comments3 min readLW link

(aizi.substack.com)

LM Situational Awareness, Evaluation Proposal: Violating Imitation

Jacob PfauApr 26, 2023, 10:53 PM

16 points

2 comments2 min readLW link

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak and Sam F. Brown

Nov 8, 2023, 11:37 AM

49 points

0 comments18 min readLW link

Large Language Models can Strategically Deceive their Users when Put Under Pressure.

ReaderMNov 15, 2023, 4:36 PM

89 points

9 comments2 min readLW link 1 review

(arxiv.org)

An Increasingly Manipulative Newsfeed

Michaël TrazziJul 1, 2019, 3:26 PM

63 points

16 comments5 min readLW link

Cheerios: An “Untested New Drug”

MBlumeMay 15, 2009, 2:26 AM

9 points

14 comments1 min readLW link

How theism works

Paul CrowleyApr 10, 2009, 4:16 PM

59 points

39 comments1 min readLW link

Toy model of the AI control problem: animated version

Stuart_ArmstrongOct 10, 2017, 11:06 AM

23 points

8 comments1 min readLW link

Dishonest Update Reporting

ZviMay 4, 2019, 2:10 PM

61 points

27 comments6 min readLW link 2 reviews

(thezvi.wordpress.com)

White Lies

ChrisHallquistFeb 8, 2014, 1:20 AM

60 points

903 comments5 min readLW link

Are minimal circuits deceptive?

evhubSep 7, 2019, 6:11 PM

78 points

11 comments8 min readLW link

Will transparency help catch deception? Perhaps not

Matthew BarnettNov 4, 2019, 8:52 PM

43 points

5 comments7 min readLW link

Plausibly, almost every powerful algorithm would be manipulative

Stuart_ArmstrongFeb 6, 2020, 11:50 AM

38 points

25 comments3 min readLW link

Why artificial optimism?

jessicataJul 15, 2019, 9:41 PM

67 points

29 comments4 min readLW link

(unstableontology.com)

Entangled Truths, Contagious Lies

Eliezer YudkowskyOct 15, 2008, 11:39 PM

106 points

42 comments4 min readLW link

Knowing I’m Being Tricked is Barely Enough

ElizabethFeb 26, 2019, 5:50 PM

37 points

10 comments2 min readLW link

(acesounderglass.com)

Not Technically Lying

PsychohistorianJul 4, 2009, 6:40 PM

50 points

86 comments4 min readLW link

Sex, Lies, and Dexamethasone

Jacob FalkovichFeb 20, 2018, 7:56 PM

15 points

1 comment9 min readLW link

Attention! Financial scam targeting Less Wrong users

Viliam_BurMay 14, 2016, 5:38 PM

38 points

92 comments2 min readLW link

If we can’t lie to others, we will lie to ourselves

paulfchristianoNov 26, 2016, 10:29 PM

45 points

24 comments1 min readLW link

(sideways-view.com)

A way to make solving alignment 10.000 times easier. The shorter case for a massive open source simbox project.

AlexFromSafeTransitionJun 21, 2023, 8:08 AM

2 points

16 comments14 min readLW link

No comments.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer