RSS

Deception

TagLast edit: Feb 8, 2023, 2:23 PM by Roman Leventov

Deception is the act of sharing information in a way which intentionally misleads others.

Related Pages: Deceptive Alignment, Honesty, Meta-Honesty, Self-Deception, Simulacrum Levels

Maybe Ly­ing Can’t Ex­ist?!

Zack_M_DavisAug 23, 2020, 12:36 AM
58 points
16 comments5 min readLW link

Agree­ing With Stalin in Ways That Ex­hibit Gen­er­ally Ra­tion­al­ist Principles

Zack_M_DavisMar 2, 2024, 10:05 PM
29 points
25 comments58 min readLW link
(unremediatedgender.space)

Al­gorithms of De­cep­tion!

Zack_M_DavisOct 19, 2019, 6:04 PM
24 points
7 comments5 min readLW link

AI De­cep­tion: A Sur­vey of Ex­am­ples, Risks, and Po­ten­tial Solutions

Aug 29, 2023, 1:29 AM
54 points
3 comments10 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaleyDec 18, 2023, 8:12 AM
30 points
14 comments9 min readLW link

Con­flict The­ory of Bounded Distrust

Zack_M_DavisFeb 12, 2023, 5:30 AM
112 points
33 comments3 min readLW link1 review

De­con­fus­ing Deception

J BostockJan 29, 2022, 4:43 PM
28 points
6 comments2 min readLW link

LCDT, A My­opic De­ci­sion Theory

Aug 3, 2021, 10:41 PM
57 points
50 comments15 min readLW link

Deep Deceptiveness

So8resMar 21, 2023, 2:51 AM
247 points
60 comments14 min readLW link1 review

On hid­ing the source of knowledge

jessicataJan 26, 2020, 2:48 AM
120 points
40 comments3 min readLW link
(unstableontology.com)

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman LeventovDec 19, 2023, 4:49 PM
17 points
5 comments3 min readLW link

If Clar­ity Seems Like Death to Them

Zack_M_DavisDec 30, 2023, 5:40 PM
47 points
192 comments87 min readLW link1 review
(unremediatedgender.space)

[Question] Why do so many think de­cep­tion in AI is im­por­tant?

PrometheusJan 13, 2024, 8:14 AM
24 points
12 comments1 min readLW link

Difficulty classes for al­ign­ment properties

JozdienFeb 20, 2024, 9:08 AM
34 points
5 comments2 min readLW link

[Question] Why is o1 so de­cep­tive?

abramdemskiSep 27, 2024, 5:27 PM
180 points
24 comments3 min readLW link

Deep Honesty

AletheophileMay 7, 2024, 8:31 PM
158 points
25 comments9 min readLW link

How to Corner Liars: A Mi­asma-Clear­ing Protocol

ymeskhoutFeb 27, 2025, 5:18 PM
60 points
23 comments7 min readLW link
(www.ymeskhout.com)

Ly­ing is Cowardice, not Strategy

Oct 24, 2023, 1:24 PM
31 points
73 comments5 min readLW link
(cognition.cafe)

Firm­ing Up Not-Ly­ing Around Its Edge-Cases Is Less Broadly Use­ful Than One Might Ini­tially Think

Zack_M_DavisDec 27, 2019, 5:09 AM
127 points
43 comments8 min readLW link2 reviews

Op­ti­mized Pro­pa­ganda with Bayesian Net­works: Com­ment on “Ar­tic­u­lat­ing Lay The­o­ries Through Graph­i­cal Models”

Zack_M_DavisJun 29, 2020, 2:45 AM
105 points
10 comments4 min readLW link

Maybe Ly­ing Doesn’t Exist

Zack_M_DavisOct 14, 2019, 7:04 AM
70 points
59 comments8 min readLW link

Can crimes be dis­cussed liter­ally?

BenquoMar 22, 2020, 8:17 PM
102 points
38 comments2 min readLW link3 reviews
(benjaminrosshoffman.com)

Don’t Dou­ble-Crux With Suicide Rock

Zack_M_DavisJan 1, 2020, 7:02 PM
91 points
30 comments2 min readLW link

Ly­ing Align­ment Chart

Zack_M_DavisNov 29, 2023, 4:15 PM
77 points
17 comments1 min readLW link

“Ra­tion­al­iz­ing” and “Sit­ting Bolt Upright in Alarm.”

RaemonJul 8, 2019, 8:34 PM
45 points
56 comments4 min readLW link

Su­per­in­tel­li­gence 11: The treach­er­ous turn

KatjaGraceNov 25, 2014, 2:00 AM
16 points
50 comments6 min readLW link

“On Bul­lshit” and “On Truth,” by Harry Frankfurt

CallmesalticidaeAug 28, 2020, 12:44 AM
20 points
3 comments6 min readLW link

When Some­one Tells You They’re Ly­ing, Believe Them

ymeskhoutJul 14, 2023, 12:31 AM
95 points
3 comments3 min readLW link

Model Or­ganisms of Misal­ign­ment: The Case for a New Pillar of Align­ment Research

Aug 8, 2023, 1:30 AM
318 points
30 comments18 min readLW link1 review

Less Wrong Poetry Corner: Coven­try Pat­more’s “Magna Est Ver­i­tas”

Zack_M_DavisJan 30, 2021, 5:16 AM
15 points
1 comment1 min readLW link

Over­con­fi­dence is Deceit

Duncan Sabien (Deactivated)Feb 17, 2021, 10:45 AM
78 points
29 comments11 min readLW link

Un­nat­u­ral Cat­e­gories Are Op­ti­mized for Deception

Zack_M_DavisJan 8, 2021, 8:54 PM
89 points
29 comments33 min readLW link1 review

Com­mu­ni­ca­tion Re­quires Com­mon In­ter­ests or Differ­en­tial Sig­nal Costs

Zack_M_DavisMar 26, 2021, 6:41 AM
40 points
13 comments3 min readLW link1 review

[Book Re­view] “Hou­dini on Magic” by Harry Houdini

lsusrSep 29, 2021, 2:37 AM
21 points
1 comment6 min readLW link

Com­ment on “De­cep­tion as Co­op­er­a­tion”

Zack_M_DavisNov 27, 2021, 4:04 AM
23 points
4 comments7 min readLW link

On Bounded Distrust

ZviFeb 3, 2022, 2:50 PM
135 points
19 comments56 min readLW link1 review
(thezvi.wordpress.com)

[Question] Every­one’s mired in the deep­est con­fu­sion, some of the time?

M. Y. ZuoFeb 9, 2022, 2:53 AM
1 point
2 comments1 min readLW link

The Speed + Sim­plic­ity Prior is prob­a­bly anti-deceptive

Yonadav ShavitApr 27, 2022, 7:30 PM
28 points
28 comments12 min readLW link

Pre­cur­sor check­ing for de­cep­tive alignment

evhubAug 3, 2022, 10:56 PM
24 points
0 comments14 min readLW link

How likely is de­cep­tive al­ign­ment?

evhubAug 30, 2022, 7:34 PM
104 points
28 comments60 min readLW link

“Ra­tion­al­ist Dis­course” Is Like “Physi­cist Mo­tors”

Zack_M_DavisFeb 26, 2023, 5:58 AM
136 points
153 comments9 min readLW link1 review

Con­tract Fraud

jefftkMar 1, 2023, 3:10 AM
86 points
10 comments1 min readLW link
(www.jefftk.com)

Notes on Honesty

David GrossOct 28, 2020, 12:54 AM
46 points
6 comments20 min readLW link

Notes on Sincer­ity and such

David GrossDec 1, 2020, 5:09 AM
9 points
2 comments10 min readLW link

For­mal­iz­ing Deception

JamesHJun 26, 2022, 5:39 PM
14 points
2 comments5 min readLW link

(Par­tial) failure in repli­cat­ing de­cep­tive al­ign­ment experiment

claudia.biancottiJan 7, 2024, 5:56 PM
1 point
0 comments1 min readLW link

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël TrazziJul 1, 2019, 3:26 PM
63 points
16 comments5 min readLW link

Chee­rios: An “Un­tested New Drug”

MBlumeMay 15, 2009, 2:26 AM
9 points
14 comments1 min readLW link

How the­ism works

Paul CrowleyApr 10, 2009, 4:16 PM
59 points
39 comments1 min readLW link

Toy model of the AI con­trol prob­lem: an­i­mated version

Stuart_ArmstrongOct 10, 2017, 11:06 AM
23 points
8 comments1 min readLW link

Dishon­est Up­date Reporting

ZviMay 4, 2019, 2:10 PM
61 points
27 comments6 min readLW link2 reviews
(thezvi.wordpress.com)

White Lies

ChrisHallquistFeb 8, 2014, 1:20 AM
60 points
903 comments5 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhubSep 7, 2019, 6:11 PM
78 points
11 comments8 min readLW link

Will trans­parency help catch de­cep­tion? Per­haps not

Matthew BarnettNov 4, 2019, 8:52 PM
43 points
5 comments7 min readLW link

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_ArmstrongFeb 6, 2020, 11:50 AM
38 points
25 comments3 min readLW link

Why ar­tifi­cial op­ti­mism?

jessicataJul 15, 2019, 9:41 PM
67 points
29 comments4 min readLW link
(unstableontology.com)

En­tan­gled Truths, Con­ta­gious Lies

Eliezer YudkowskyOct 15, 2008, 11:39 PM
106 points
42 comments4 min readLW link

Know­ing I’m Be­ing Tricked is Barely Enough

ElizabethFeb 26, 2019, 5:50 PM
37 points
10 comments2 min readLW link
(acesounderglass.com)

Not Tech­ni­cally Lying

PsychohistorianJul 4, 2009, 6:40 PM
50 points
86 comments4 min readLW link

Sex, Lies, and Dexamethasone

Jacob FalkovichFeb 20, 2018, 7:56 PM
15 points
1 comment9 min readLW link

At­ten­tion! Fi­nan­cial scam tar­get­ing Less Wrong users

Viliam_BurMay 14, 2016, 5:38 PM
38 points
92 comments2 min readLW link

If we can’t lie to oth­ers, we will lie to ourselves

paulfchristianoNov 26, 2016, 10:29 PM
45 points
24 comments1 min readLW link
(sideways-view.com)

Train­ing Trace Pri­ors and Speed Priors

Adam JermynJun 26, 2022, 6:07 PM
17 points
0 comments3 min readLW link

A way to make solv­ing al­ign­ment 10.000 times eas­ier. The shorter case for a mas­sive open source sim­box pro­ject.

AlexFromSafeTransitionJun 21, 2023, 8:08 AM
2 points
16 comments14 min readLW link

Univer­sal­ity Unwrapped

adamShimiAug 21, 2020, 6:53 PM
29 points
2 comments18 min readLW link

La­tent Ad­ver­sar­ial Training

Adam JermynJun 29, 2022, 8:04 PM
52 points
13 comments5 min readLW link

Func­tional silence: com­mu­ni­ca­tion that min­i­mizes change of re­ceiver’s beliefs

chaosmageFeb 12, 2019, 9:32 PM
27 points
5 comments2 min readLW link

Of Lies and Black Swan Blowups

Eliezer YudkowskyApr 7, 2009, 6:26 PM
31 points
8 comments1 min readLW link

Model­ling Deception

Garrett BakerJul 18, 2022, 9:21 PM
15 points
0 comments7 min readLW link

[Linkpost] De­cep­tion Abil­ities Emerged in Large Lan­guage Models

Bogdan Ionut CirsteaAug 3, 2023, 5:28 PM
12 points
0 comments1 min readLW link

De­cep­tion as the op­ti­mal: mesa-op­ti­miz­ers and in­ner al­ign­ment

Eleni AngelouAug 16, 2022, 4:49 AM
11 points
0 comments5 min readLW link

Cor­rigi­bil­ity thoughts III: ma­nipu­lat­ing ver­sus deceiving

Stuart_ArmstrongJan 18, 2017, 3:57 PM
3 points
0 comments1 min readLW link

Blatant lies are the best kind!

BenquoJul 3, 2019, 8:45 PM
26 points
17 comments5 min readLW link
(benjaminrosshoffman.com)

[LINK] EA Has A Ly­ing Problem

BenquoJan 11, 2017, 10:31 PM
28 points
34 comments1 min readLW link
(srconstantin.wordpress.com)

Match­ing dona­tion fundraisers can be harm­fully dishon­est.

BenquoNov 11, 2016, 9:05 PM
18 points
6 comments14 min readLW link

Mislead­ing the witness

Bo102010Aug 9, 2009, 8:13 PM
16 points
116 comments2 min readLW link

The Santa de­cep­tion: how did it af­fect you?

DesrtopaDec 20, 2010, 10:27 PM
30 points
204 comments1 min readLW link

How to solve de­cep­tion and still fail.

Charlie SteinerOct 4, 2023, 7:56 PM
40 points
7 comments6 min readLW link

Thoughts On (Solv­ing) Deep Deception

JozdienOct 21, 2023, 10:40 PM
71 points
6 comments6 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni AngelouSep 3, 2022, 12:47 PM
9 points
0 comments3 min readLW link

Mon­i­tor­ing for de­cep­tive alignment

evhubSep 8, 2022, 11:07 PM
135 points
8 comments9 min readLW link

Get­ting up to Speed on the Speed Prior in 2022

robertzkDec 28, 2022, 7:49 AM
36 points
5 comments65 min readLW link

De­cep­tion Chess

Chris LandJan 1, 2024, 3:40 PM
7 points
2 comments4 min readLW link

LLMs can strate­gi­cally de­ceive while do­ing gain-of-func­tion re­search

Igor IvanovJan 24, 2024, 3:45 PM
33 points
4 comments11 min readLW link

‘Em­piri­cism!’ as Anti-Epistemology

Eliezer YudkowskyMar 14, 2024, 2:02 AM
171 points
90 comments25 min readLW link

The com­mer­cial in­cen­tive to in­ten­tion­ally train AI to de­ceive us

Derek M. JonesDec 29, 2022, 11:30 AM
5 points
1 comment4 min readLW link
(shape-of-code.com)

In­duc­ing Un­prompted Misal­ign­ment in LLMs

Apr 19, 2024, 8:00 PM
38 points
7 comments16 min readLW link

Sparse Fea­tures Through Time

Rogan InglisJun 24, 2024, 6:06 PM
12 points
1 comment1 min readLW link
(roganinglis.io)

[Paper Blog­post] When Your AIs De­ceive You: Challenges with Par­tial Ob­serv­abil­ity in RLHF

Leon LangOct 22, 2024, 1:57 PM
51 points
2 comments18 min readLW link
(arxiv.org)

Se­cret Col­lu­sion: Will We Know When to Un­plug AI?

Sep 16, 2024, 4:07 PM
56 points
7 comments31 min readLW link

Eth­i­cal De­cep­tion: Should AI Ever Lie?

Jason ReidAug 2, 2024, 5:53 PM
5 points
2 comments7 min readLW link

Let’s use AI to harden hu­man defenses against AI manipulation

Tom DavidsonMay 17, 2023, 11:33 PM
35 points
7 comments24 min readLW link

De­cep­tive failures short of full catas­tro­phe.

Alex Lawsen Jan 15, 2023, 7:28 PM
33 points
5 comments9 min readLW link

On Tar­geted Ma­nipu­la­tion and De­cep­tion when Op­ti­miz­ing LLMs for User Feedback

Nov 7, 2024, 3:39 PM
50 points
7 comments11 min readLW link

On In­ten­tion­al­ity, or: Towards a More In­clu­sive Con­cept of Lying

Cornelius DybdahlOct 18, 2024, 10:37 AM
8 points
0 comments4 min readLW link

Cau­tions about LLMs in Hu­man Cog­ni­tive Loops

Alice BlairMar 2, 2025, 7:53 PM
38 points
9 comments7 min readLW link

The pre­sent perfect tense is ru­in­ing your life

PatrickDFarleyJan 27, 2025, 4:14 PM
24 points
14 comments8 min readLW link

My Clients, The Liars

ymeskhoutMar 5, 2024, 9:06 PM
246 points
86 comments7 min readLW link

AI x-risk, ap­prox­i­mately or­dered by embarrassment

Alex Lawsen Apr 12, 2023, 11:01 PM
151 points
7 comments19 min readLW link

Re­search Re­port: In­cor­rect­ness Cascades

Robert_AIZIApr 14, 2023, 12:49 PM
19 points
0 comments10 min readLW link
(aizi.substack.com)

De­cep­tion Strategies

Thoth HermesApr 20, 2023, 3:59 PM
−7 points
2 comments5 min readLW link
(thothhermes.substack.com)

I was Wrong, Si­mu­la­tor The­ory is Real

Robert_AIZIApr 26, 2023, 5:45 PM
75 points
7 comments3 min readLW link
(aizi.substack.com)

LM Si­tu­a­tional Aware­ness, Eval­u­a­tion Pro­posal: Vio­lat­ing Imitation

Jacob PfauApr 26, 2023, 10:53 PM
16 points
2 comments2 min readLW link

Dis­cus­sion: Challenges with Un­su­per­vised LLM Knowl­edge Discovery

Dec 18, 2023, 11:58 AM
147 points
21 comments10 min readLW link

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

Nov 8, 2023, 11:37 AM
49 points
0 comments18 min readLW link

Large Lan­guage Models can Strate­gi­cally De­ceive their Users when Put Un­der Pres­sure.

ReaderMNov 15, 2023, 4:36 PM
89 points
9 comments2 min readLW link1 review
(arxiv.org)

Find­ing De­cep­tion in Lan­guage Models

Aug 20, 2024, 9:42 AM
18 points
4 comments4 min readLW link

EIS VIII: An Eng­ineer’s Un­der­stand­ing of De­cep­tive Alignment

scasperFeb 19, 2023, 3:25 PM
30 points
5 comments4 min readLW link

Why I’m Wor­ried About AI

peterbarnettMay 23, 2022, 9:13 PM
22 points
2 comments12 min readLW link

Train­ing Trace Priors

Adam JermynJun 13, 2022, 2:22 PM
12 points
17 comments4 min readLW link

Multi­gate Priors

Adam JermynJun 15, 2022, 7:30 PM
4 points
0 comments3 min readLW link

Con­di­tion­ing Gen­er­a­tive Models

Adam JermynJun 25, 2022, 10:15 PM
24 points
18 comments10 min readLW link
No comments.