RSS

Deception

TagLast edit: 8 Feb 2023 14:23 UTC by Roman Leventov

Deception is the act of sharing information in a way which intentionally misleads others.

Related Pages: Deceptive Alignment, Honesty, Meta-Honesty, Self-Deception, Simulacrum Levels

Maybe Ly­ing Can’t Ex­ist?!

Zack_M_Davis23 Aug 2020 0:36 UTC
58 points
16 comments5 min readLW link

Agree­ing With Stalin in Ways That Ex­hibit Gen­er­ally Ra­tion­al­ist Principles

Zack_M_Davis2 Mar 2024 22:05 UTC
26 points
22 comments58 min readLW link
(unremediatedgender.space)

Al­gorithms of De­cep­tion!

Zack_M_Davis19 Oct 2019 18:04 UTC
23 points
7 comments5 min readLW link

AI De­cep­tion: A Sur­vey of Ex­am­ples, Risks, and Po­ten­tial Solutions

29 Aug 2023 1:29 UTC
53 points
3 comments10 min readLW link

In­ter­pret­ing the Learn­ing of Deceit

RogerDearnaley18 Dec 2023 8:12 UTC
30 points
14 comments9 min readLW link

Con­flict The­ory of Bounded Distrust

Zack_M_Davis12 Feb 2023 5:30 UTC
108 points
30 comments3 min readLW link1 review

De­con­fus­ing Deception

J Bostock29 Jan 2022 16:43 UTC
28 points
6 comments2 min readLW link

LCDT, A My­opic De­ci­sion Theory

3 Aug 2021 22:41 UTC
57 points
50 comments15 min readLW link

Deep Deceptiveness

So8res21 Mar 2023 2:51 UTC
238 points
59 comments14 min readLW link

On hid­ing the source of knowledge

jessicata26 Jan 2020 2:48 UTC
115 points
40 comments3 min readLW link
(unstableontology.com)

So­ci­aLLM: pro­posal for a lan­guage model de­sign for per­son­al­ised apps, so­cial sci­ence, and AI safety research

Roman Leventov19 Dec 2023 16:49 UTC
17 points
5 comments3 min readLW link

If Clar­ity Seems Like Death to Them

Zack_M_Davis30 Dec 2023 17:40 UTC
45 points
192 comments87 min readLW link1 review
(unremediatedgender.space)

[Question] Why do so many think de­cep­tion in AI is im­por­tant?

Prometheus13 Jan 2024 8:14 UTC
23 points
12 comments1 min readLW link

Difficulty classes for al­ign­ment properties

Jozdien20 Feb 2024 9:08 UTC
34 points
5 comments2 min readLW link

[Question] Why is o1 so de­cep­tive?

abramdemski27 Sep 2024 17:27 UTC
177 points
24 comments3 min readLW link

Deep Honesty

Aletheophile7 May 2024 20:31 UTC
157 points
25 comments9 min readLW link

Ly­ing is Cowardice, not Strategy

24 Oct 2023 13:24 UTC
31 points
73 comments5 min readLW link
(cognition.cafe)

Firm­ing Up Not-Ly­ing Around Its Edge-Cases Is Less Broadly Use­ful Than One Might Ini­tially Think

Zack_M_Davis27 Dec 2019 5:09 UTC
127 points
43 comments8 min readLW link2 reviews

Op­ti­mized Pro­pa­ganda with Bayesian Net­works: Com­ment on “Ar­tic­u­lat­ing Lay The­o­ries Through Graph­i­cal Models”

Zack_M_Davis29 Jun 2020 2:45 UTC
105 points
10 comments4 min readLW link

Maybe Ly­ing Doesn’t Exist

Zack_M_Davis14 Oct 2019 7:04 UTC
70 points
58 comments8 min readLW link

Can crimes be dis­cussed liter­ally?

Benquo22 Mar 2020 20:17 UTC
102 points
38 comments2 min readLW link3 reviews
(benjaminrosshoffman.com)

Don’t Dou­ble-Crux With Suicide Rock

Zack_M_Davis1 Jan 2020 19:02 UTC
91 points
30 comments2 min readLW link

Ly­ing Align­ment Chart

Zack_M_Davis29 Nov 2023 16:15 UTC
77 points
17 comments1 min readLW link

“Ra­tion­al­iz­ing” and “Sit­ting Bolt Upright in Alarm.”

Raemon8 Jul 2019 20:34 UTC
45 points
56 comments4 min readLW link

Su­per­in­tel­li­gence 11: The treach­er­ous turn

KatjaGrace25 Nov 2014 2:00 UTC
16 points
50 comments6 min readLW link

“On Bul­lshit” and “On Truth,” by Harry Frankfurt

Callmesalticidae28 Aug 2020 0:44 UTC
20 points
3 comments6 min readLW link

When Some­one Tells You They’re Ly­ing, Believe Them

ymeskhout14 Jul 2023 0:31 UTC
95 points
3 comments3 min readLW link

Model Or­ganisms of Misal­ign­ment: The Case for a New Pillar of Align­ment Research

8 Aug 2023 1:30 UTC
312 points
29 comments18 min readLW link1 review

Less Wrong Poetry Corner: Coven­try Pat­more’s “Magna Est Ver­i­tas”

Zack_M_Davis30 Jan 2021 5:16 UTC
15 points
1 comment1 min readLW link

Over­con­fi­dence is Deceit

Duncan Sabien (Deactivated)17 Feb 2021 10:45 UTC
78 points
29 comments11 min readLW link

Un­nat­u­ral Cat­e­gories Are Op­ti­mized for Deception

Zack_M_Davis8 Jan 2021 20:54 UTC
89 points
29 comments33 min readLW link1 review

Com­mu­ni­ca­tion Re­quires Com­mon In­ter­ests or Differ­en­tial Sig­nal Costs

Zack_M_Davis26 Mar 2021 6:41 UTC
40 points
13 comments3 min readLW link1 review

[Book Re­view] “Hou­dini on Magic” by Harry Houdini

lsusr29 Sep 2021 2:37 UTC
21 points
1 comment6 min readLW link

Com­ment on “De­cep­tion as Co­op­er­a­tion”

Zack_M_Davis27 Nov 2021 4:04 UTC
23 points
4 comments7 min readLW link

On Bounded Distrust

Zvi3 Feb 2022 14:50 UTC
135 points
19 comments56 min readLW link1 review
(thezvi.wordpress.com)

[Question] Every­one’s mired in the deep­est con­fu­sion, some of the time?

M. Y. Zuo9 Feb 2022 2:53 UTC
1 point
2 comments1 min readLW link

The Speed + Sim­plic­ity Prior is prob­a­bly anti-deceptive

Yonadav Shavit27 Apr 2022 19:30 UTC
28 points
28 comments12 min readLW link

Pre­cur­sor check­ing for de­cep­tive alignment

evhub3 Aug 2022 22:56 UTC
24 points
0 comments14 min readLW link

How likely is de­cep­tive al­ign­ment?

evhub30 Aug 2022 19:34 UTC
103 points
28 comments60 min readLW link

“Ra­tion­al­ist Dis­course” Is Like “Physi­cist Mo­tors”

Zack_M_Davis26 Feb 2023 5:58 UTC
136 points
153 comments9 min readLW link1 review

Con­tract Fraud

jefftk1 Mar 2023 3:10 UTC
86 points
10 comments1 min readLW link
(www.jefftk.com)

Notes on Honesty

David Gross28 Oct 2020 0:54 UTC
46 points
6 comments21 min readLW link

Notes on Sincer­ity and such

David Gross1 Dec 2020 5:09 UTC
9 points
2 comments11 min readLW link

An In­creas­ingly Ma­nipu­la­tive Newsfeed

Michaël Trazzi1 Jul 2019 15:26 UTC
63 points
16 comments5 min readLW link

Chee­rios: An “Un­tested New Drug”

MBlume15 May 2009 2:26 UTC
9 points
14 comments1 min readLW link

How the­ism works

Paul Crowley10 Apr 2009 16:16 UTC
59 points
39 comments1 min readLW link

Toy model of the AI con­trol prob­lem: an­i­mated version

Stuart_Armstrong10 Oct 2017 11:06 UTC
23 points
8 comments1 min readLW link

Dishon­est Up­date Reporting

Zvi4 May 2019 14:10 UTC
61 points
27 comments6 min readLW link2 reviews
(thezvi.wordpress.com)

White Lies

ChrisHallquist8 Feb 2014 1:20 UTC
60 points
902 comments5 min readLW link

Are min­i­mal cir­cuits de­cep­tive?

evhub7 Sep 2019 18:11 UTC
78 points
11 comments8 min readLW link

Will trans­parency help catch de­cep­tion? Per­haps not

Matthew Barnett4 Nov 2019 20:52 UTC
43 points
5 comments7 min readLW link

Plau­si­bly, al­most ev­ery pow­er­ful al­gorithm would be manipulative

Stuart_Armstrong6 Feb 2020 11:50 UTC
38 points
25 comments3 min readLW link

Why ar­tifi­cial op­ti­mism?

jessicata15 Jul 2019 21:41 UTC
67 points
29 comments4 min readLW link
(unstableontology.com)

En­tan­gled Truths, Con­ta­gious Lies

Eliezer Yudkowsky15 Oct 2008 23:39 UTC
106 points
42 comments4 min readLW link

Know­ing I’m Be­ing Tricked is Barely Enough

Elizabeth26 Feb 2019 17:50 UTC
37 points
10 comments2 min readLW link
(acesounderglass.com)

Not Tech­ni­cally Lying

Psychohistorian4 Jul 2009 18:40 UTC
50 points
86 comments4 min readLW link

Sex, Lies, and Dexamethasone

Jacob Falkovich20 Feb 2018 19:56 UTC
15 points
1 comment9 min readLW link

At­ten­tion! Fi­nan­cial scam tar­get­ing Less Wrong users

Viliam_Bur14 May 2016 17:38 UTC
38 points
92 comments2 min readLW link

If we can’t lie to oth­ers, we will lie to ourselves

paulfchristiano26 Nov 2016 22:29 UTC
45 points
24 comments1 min readLW link
(sideways-view.com)

Train­ing Trace Pri­ors and Speed Priors

Adam Jermyn26 Jun 2022 18:07 UTC
17 points
0 comments3 min readLW link

A way to make solv­ing al­ign­ment 10.000 times eas­ier. The shorter case for a mas­sive open source sim­box pro­ject.

AlexFromSafeTransition21 Jun 2023 8:08 UTC
2 points
16 comments14 min readLW link

Univer­sal­ity Unwrapped

adamShimi21 Aug 2020 18:53 UTC
29 points
2 comments18 min readLW link

La­tent Ad­ver­sar­ial Training

Adam Jermyn29 Jun 2022 20:04 UTC
50 points
13 comments5 min readLW link

Func­tional silence: com­mu­ni­ca­tion that min­i­mizes change of re­ceiver’s beliefs

chaosmage12 Feb 2019 21:32 UTC
27 points
5 comments2 min readLW link

Of Lies and Black Swan Blowups

Eliezer Yudkowsky7 Apr 2009 18:26 UTC
30 points
8 comments1 min readLW link

Model­ling Deception

Garrett Baker18 Jul 2022 21:21 UTC
15 points
0 comments7 min readLW link

[Linkpost] De­cep­tion Abil­ities Emerged in Large Lan­guage Models

Bogdan Ionut Cirstea3 Aug 2023 17:28 UTC
12 points
0 comments1 min readLW link

De­cep­tion as the op­ti­mal: mesa-op­ti­miz­ers and in­ner al­ign­ment

Eleni Angelou16 Aug 2022 4:49 UTC
11 points
0 comments5 min readLW link

Cor­rigi­bil­ity thoughts III: ma­nipu­lat­ing ver­sus deceiving

Stuart_Armstrong18 Jan 2017 15:57 UTC
3 points
0 comments1 min readLW link

Blatant lies are the best kind!

Benquo3 Jul 2019 20:45 UTC
28 points
17 comments5 min readLW link
(benjaminrosshoffman.com)

[LINK] EA Has A Ly­ing Problem

Benquo11 Jan 2017 22:31 UTC
28 points
34 comments1 min readLW link
(srconstantin.wordpress.com)

Match­ing dona­tion fundraisers can be harm­fully dishon­est.

Benquo11 Nov 2016 21:05 UTC
18 points
6 comments14 min readLW link

Mislead­ing the witness

Bo1020109 Aug 2009 20:13 UTC
16 points
116 comments2 min readLW link

The Santa de­cep­tion: how did it af­fect you?

Desrtopa20 Dec 2010 22:27 UTC
30 points
204 comments1 min readLW link

How to solve de­cep­tion and still fail.

Charlie Steiner4 Oct 2023 19:56 UTC
40 points
7 comments6 min readLW link

Thoughts On (Solv­ing) Deep Deception

Jozdien21 Oct 2023 22:40 UTC
69 points
4 comments6 min readLW link

Three sce­nar­ios of pseudo-al­ign­ment

Eleni Angelou3 Sep 2022 12:47 UTC
9 points
0 comments3 min readLW link

Mon­i­tor­ing for de­cep­tive alignment

evhub8 Sep 2022 23:07 UTC
135 points
8 comments9 min readLW link

Get­ting up to Speed on the Speed Prior in 2022

robertzk28 Dec 2022 7:49 UTC
36 points
5 comments65 min readLW link

De­cep­tion Chess

Chris Land1 Jan 2024 15:40 UTC
7 points
2 comments4 min readLW link

The com­mer­cial in­cen­tive to in­ten­tion­ally train AI to de­ceive us

Derek M. Jones29 Dec 2022 11:30 UTC
5 points
1 comment4 min readLW link
(shape-of-code.com)

LLMs can strate­gi­cally de­ceive while do­ing gain-of-func­tion re­search

Igor Ivanov24 Jan 2024 15:45 UTC
33 points
4 comments11 min readLW link

‘Em­piri­cism!’ as Anti-Epistemology

Eliezer Yudkowsky14 Mar 2024 2:02 UTC
171 points
90 comments25 min readLW link

De­cep­tive failures short of full catas­tro­phe.

Alex Lawsen 15 Jan 2023 19:28 UTC
33 points
5 comments9 min readLW link

In­duc­ing Un­prompted Misal­ign­ment in LLMs

19 Apr 2024 20:00 UTC
38 points
6 comments16 min readLW link

Sparse Fea­tures Through Time

Rogan Inglis24 Jun 2024 18:06 UTC
12 points
1 comment1 min readLW link
(roganinglis.io)

[Paper Blog­post] When Your AIs De­ceive You: Challenges with Par­tial Ob­serv­abil­ity in RLHF

Leon Lang22 Oct 2024 13:57 UTC
50 points
1 comment18 min readLW link
(arxiv.org)

Se­cret Col­lu­sion: Will We Know When to Un­plug AI?

16 Sep 2024 16:07 UTC
55 points
7 comments31 min readLW link

Eth­i­cal De­cep­tion: Should AI Ever Lie?

Jason Reid2 Aug 2024 17:53 UTC
5 points
2 comments7 min readLW link

Let’s use AI to harden hu­man defenses against AI manipulation

Tom Davidson17 May 2023 23:33 UTC
34 points
7 comments24 min readLW link

On Tar­geted Ma­nipu­la­tion and De­cep­tion when Op­ti­miz­ing LLMs for User Feedback

7 Nov 2024 15:39 UTC
47 points
6 comments11 min readLW link

On In­ten­tion­al­ity, or: Towards a More In­clu­sive Con­cept of Lying

Cornelius Dybdahl18 Oct 2024 10:37 UTC
8 points
0 comments4 min readLW link

My Clients, The Liars

ymeskhout5 Mar 2024 21:06 UTC
248 points
85 comments7 min readLW link

AI x-risk, ap­prox­i­mately or­dered by embarrassment

Alex Lawsen 12 Apr 2023 23:01 UTC
151 points
7 comments19 min readLW link

Re­search Re­port: In­cor­rect­ness Cascades

Robert_AIZI14 Apr 2023 12:49 UTC
19 points
0 comments10 min readLW link
(aizi.substack.com)

De­cep­tion Strategies

Thoth Hermes20 Apr 2023 15:59 UTC
−7 points
2 comments5 min readLW link
(thothhermes.substack.com)

I was Wrong, Si­mu­la­tor The­ory is Real

Robert_AIZI26 Apr 2023 17:45 UTC
75 points
7 comments3 min readLW link
(aizi.substack.com)

LM Si­tu­a­tional Aware­ness, Eval­u­a­tion Pro­posal: Vio­lat­ing Imitation

Jacob Pfau26 Apr 2023 22:53 UTC
16 points
2 comments2 min readLW link

Dis­cus­sion: Challenges with Un­su­per­vised LLM Knowl­edge Discovery

18 Dec 2023 11:58 UTC
147 points
21 comments10 min readLW link

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

8 Nov 2023 11:37 UTC
49 points
0 comments18 min readLW link

Large Lan­guage Models can Strate­gi­cally De­ceive their Users when Put Un­der Pres­sure.

ReaderM15 Nov 2023 16:36 UTC
89 points
9 comments2 min readLW link1 review
(arxiv.org)

Find­ing De­cep­tion in Lan­guage Models

20 Aug 2024 9:42 UTC
18 points
4 comments4 min readLW link

EIS VIII: An Eng­ineer’s Un­der­stand­ing of De­cep­tive Alignment

scasper19 Feb 2023 15:25 UTC
30 points
5 comments4 min readLW link

Why I’m Wor­ried About AI

peterbarnett23 May 2022 21:13 UTC
22 points
2 comments12 min readLW link

Train­ing Trace Priors

Adam Jermyn13 Jun 2022 14:22 UTC
12 points
17 comments4 min readLW link

Multi­gate Priors

Adam Jermyn15 Jun 2022 19:30 UTC
4 points
0 comments3 min readLW link

Con­di­tion­ing Gen­er­a­tive Models

Adam Jermyn25 Jun 2022 22:15 UTC
24 points
18 comments10 min readLW link

For­mal­iz­ing Deception

JamesH26 Jun 2022 17:39 UTC
14 points
2 comments5 min readLW link

(Par­tial) failure in repli­cat­ing de­cep­tive al­ign­ment experiment

claudia.biancotti7 Jan 2024 17:56 UTC
1 point
0 comments1 min readLW link
No comments.