Count­ing ar­gu­ments provide no ev­i­dence for AI doom

27 Feb 2024 23:03 UTC
95 points
188 comments14 min readLW link

Which an­i­mals re­al­ize which types of sub­jec­tive welfare?

MichaelStJules27 Feb 2024 19:31 UTC
4 points
0 comments1 min readLW link

Biose­cu­rity and AI: Risks and Opportunities

Steve Newman27 Feb 2024 18:45 UTC
11 points
1 comment7 min readLW link
(www.safe.ai)

The Gem­ini In­ci­dent Continues

Zvi27 Feb 2024 16:00 UTC
45 points
6 comments48 min readLW link
(thezvi.wordpress.com)

How I in­ter­nal­ized my achieve­ments to bet­ter deal with nega­tive feelings

Raymond Koopmanschap27 Feb 2024 15:10 UTC
42 points
7 comments6 min readLW link

On Frus­tra­tion and Regret

silentbob27 Feb 2024 12:19 UTC
8 points
0 comments4 min readLW link

Facts vs In­ter­pre­ta­tions—An Ex­er­cise in Cog­ni­tive Reframing

Declan Molony27 Feb 2024 7:57 UTC
15 points
0 comments3 min readLW link

San Fran­cisco ACX Meetup “Third Satur­day”

27 Feb 2024 7:07 UTC
7 points
0 comments1 min readLW link

Ex­am­in­ing Lan­guage Model Perfor­mance with Re­con­structed Ac­ti­va­tions us­ing Sparse Au­toen­coders

27 Feb 2024 2:43 UTC
42 points
16 comments15 min readLW link

Pro­ject idea: an iter­ated pris­oner’s dilemma com­pe­ti­tion/​game

Adam Zerner26 Feb 2024 23:06 UTC
8 points
0 comments5 min readLW link

Act­ing Wholesomely

owencb26 Feb 2024 21:49 UTC
58 points
64 comments1 min readLW link

Get­ting ra­tio­nal now or later: nav­i­gat­ing pro­cras­ti­na­tion and time-in­con­sis­tent prefer­ences for new ra­tio­nal­ists

milo_thoughts26 Feb 2024 19:38 UTC
1 point
0 comments8 min readLW link

[Question] Whom Do You Trust?

JackOfAllTrades26 Feb 2024 19:38 UTC
1 point
0 comments1 min readLW link

Boundary Vio­la­tions vs Boundary Dissolution

Chipmonk26 Feb 2024 18:59 UTC
8 points
4 comments1 min readLW link

[Question] Can we get an AI to “do our al­ign­ment home­work for us”?

Chris_Leong26 Feb 2024 7:56 UTC
53 points
33 comments1 min readLW link

How I build and run be­hav­ioral interviews

benkuhn26 Feb 2024 5:50 UTC
32 points
6 comments4 min readLW link
(www.benkuhn.net)

Hid­den Cog­ni­tion De­tec­tion Meth­ods and Bench­marks

Paul Colognese26 Feb 2024 5:31 UTC
22 points
11 comments4 min readLW link

Cel­lu­lar res­pi­ra­tion as a steam engine

dkl925 Feb 2024 20:17 UTC
24 points
1 comment1 min readLW link
(dkl9.net)

[Question] Ra­tion­al­ism and Depen­dent Origi­na­tion?

Baometrus25 Feb 2024 18:16 UTC
2 points
3 comments1 min readLW link

China-AI forecasts

NathanBarnard25 Feb 2024 16:49 UTC
39 points
29 comments6 min readLW link

Ide­olog­i­cal Bayesians

Kevin Dorst25 Feb 2024 14:17 UTC
95 points
4 comments10 min readLW link
(kevindorst.substack.com)

De­con­fus­ing In-Con­text Learning

Arjun Panickssery25 Feb 2024 9:48 UTC
37 points
1 comment2 min readLW link

Everett branches, in­ter-light cone trade and other alien mat­ters: Ap­pendix to “An ECL ex­plainer”

24 Feb 2024 23:09 UTC
17 points
0 comments1 min readLW link

Co­op­er­at­ing with aliens and AGIs: An ECL explainer

24 Feb 2024 22:58 UTC
51 points
8 comments1 min readLW link

Choos­ing My Quest (Part 2 of “The Sense Of Phys­i­cal Ne­ces­sity”)

LoganStrohl24 Feb 2024 21:31 UTC
40 points
7 comments12 min readLW link

Ra­tion­al­ity Re­search Re­port: Towards 10x OODA Loop­ing?

Raemon24 Feb 2024 21:06 UTC
114 points
21 comments15 min readLW link

Let’s ask some of the largest LLMs for tips and ideas on how to take over the world

Super AGI24 Feb 2024 20:35 UTC
1 point
0 comments7 min readLW link

Ex­er­cise: Plan­mak­ing, Sur­prise An­ti­ci­pa­tion, and “Baba is You”

Raemon24 Feb 2024 20:33 UTC
49 points
19 comments6 min readLW link

In search of God.

Spiritus Dei24 Feb 2024 18:59 UTC
−19 points
3 comments7 min readLW link

Im­pos­si­bil­ity of An­thro­pocen­tric-Alignment

False Name24 Feb 2024 18:31 UTC
−8 points
2 comments39 min readLW link

The In­ner Align­ment Problem

Jakub Halmeš24 Feb 2024 17:55 UTC
1 point
1 comment3 min readLW link
(jakubhalmes.substack.com)

We Need Ma­jor, But Not Rad­i­cal, FDA Reform

Maxwell Tabarrok24 Feb 2024 16:54 UTC
42 points
12 comments7 min readLW link
(www.maximum-progress.com)

After Over­mor­row: Scat­tered Mus­ings on the Im­me­di­ate Post-AGI World

Yuli_Ban24 Feb 2024 15:49 UTC
−3 points
0 comments26 min readLW link

[Question] CDT vs. EDT on Deterrence

notfnofn24 Feb 2024 15:41 UTC
1 point
9 comments1 min readLW link

Balanc­ing Games

jefftk24 Feb 2024 14:40 UTC
61 points
18 comments1 min readLW link
(www.jefftk.com)

How well do truth probes gen­er­al­ise?

mishajw24 Feb 2024 14:12 UTC
87 points
11 comments9 min readLW link

Rawls’s Veil of Ig­no­rance Doesn’t Make Any Sense

Arjun Panickssery24 Feb 2024 13:18 UTC
10 points
9 comments1 min readLW link

[Question] Can some­one ex­plain to me what went wrong with ChatGPT?

Valentin Baltadzhiev24 Feb 2024 11:50 UTC
9 points
1 comment1 min readLW link

The Sense Of Phys­i­cal Ne­ces­sity: A Nat­u­ral­ism Demo (In­tro­duc­tion)

LoganStrohl24 Feb 2024 2:56 UTC
59 points
1 comment6 min readLW link

In­stru­men­tal de­cep­tion and ma­nipu­la­tion in LLMs—a case study

Olli Järviniemi24 Feb 2024 2:07 UTC
39 points
13 comments12 min readLW link

A start­ing point for mak­ing sense of task struc­ture (in ma­chine learn­ing)

24 Feb 2024 1:51 UTC
45 points
2 comments12 min readLW link

Why you, per­son­ally, should want a larger hu­man population

jasoncrawford23 Feb 2024 19:48 UTC
32 points
32 comments5 min readLW link
(rootsofprogress.org)

De­liber­a­tive Cog­ni­tive Al­gorithms as Scaffolding

Cole Wyeth23 Feb 2024 17:15 UTC
19 points
4 comments3 min readLW link

The Shut­down Prob­lem: In­com­plete Prefer­ences as a Solution

EJT23 Feb 2024 16:01 UTC
52 points
29 comments42 min readLW link

In set the­ory, ev­ery­thing is a set

Jacob G-W23 Feb 2024 14:35 UTC
11 points
9 comments2 min readLW link

The role of philo­soph­i­cal think­ing in un­der­stand­ing large lan­guage mod­els: Cal­ibrat­ing and clos­ing the gap be­tween first-per­son ex­pe­rience and un­der­ly­ing mechanisms

Bill Benzon23 Feb 2024 12:19 UTC
4 points
0 comments10 min readLW link

Deep and ob­vi­ous points in the gap be­tween your thoughts and your pic­tures of thought

KatjaGrace23 Feb 2024 7:30 UTC
42 points
6 comments1 min readLW link
(worldspiritsockpuppet.com)

Paraso­cial re­la­tion­ship logic

KatjaGrace23 Feb 2024 7:30 UTC
20 points
1 comment1 min readLW link
(worldspiritsockpuppet.com)

Sham­ing with and with­out naming

KatjaGrace23 Feb 2024 7:30 UTC
15 points
5 comments2 min readLW link
(worldspiritsockpuppet.com)

Com­plex­ity of value but not dis­value im­plies more fo­cus on s-risk. Mo­ral un­cer­tainty and prefer­ence util­i­tar­i­anism also do.

Chi Nguyen23 Feb 2024 6:10 UTC
52 points
18 comments1 min readLW link