Cor­rigi­bil­ity or DWIM is an at­trac­tive pri­mary goal for AGI

Seth Herd25 Nov 2023 19:37 UTC
16 points
4 comments1 min readLW link

On “slack” in train­ing (Sec­tion 1.5 of “Schem­ing AIs”)

Joe Carlsmith25 Nov 2023 17:51 UTC
1 point
0 comments5 min readLW link

An­nounc­ing New Begin­ner-friendly Book on AI Safety and Risk

Darren McKee25 Nov 2023 15:57 UTC
64 points
2 comments1 min readLW link

Fer­til­ity as Metascience

Maxwell Tabarrok25 Nov 2023 15:42 UTC
20 points
1 comment3 min readLW link
(maximumprogress.substack.com)

Re­ac­tion to “Em­pow­er­ment is (al­most) All We Need” : an open-ended alternative

Ryo 25 Nov 2023 15:35 UTC
9 points
3 comments5 min readLW link

How Microsoft’s ruth­less em­ployee eval­u­a­tion sys­tem an­nihilated team col­lab­o­ra­tion.

positivesum25 Nov 2023 13:25 UTC
3 points
2 comments1 min readLW link
(tryingtruly.substack.com)

What are the re­sults of more parental su­per­vi­sion and less out­door play?

juliawise25 Nov 2023 12:52 UTC
221 points
31 comments5 min readLW link

A sim­ple treach­er­ous turn demonstration

nikola25 Nov 2023 4:51 UTC
22 points
5 comments3 min readLW link

The two para­graph ar­gu­ment for AI risk

CronoDAS25 Nov 2023 2:01 UTC
19 points
8 comments1 min readLW link

Good­hart’s Law Ex­am­ple: Train­ing Ver­ifiers to Solve Math Word Problems

Chris_Leong25 Nov 2023 0:53 UTC
27 points
2 comments1 min readLW link
(arxiv.org)

Some thoughts on CBDC

PixelatedPenguin25 Nov 2023 0:32 UTC
−1 points
1 comment1 min readLW link

Test­ing for con­se­quence-blind­ness in LLMs us­ing the HI-ADS unit test.

David Scott Krueger (formerly: capybaralet)24 Nov 2023 23:35 UTC
25 points
2 comments2 min readLW link

Epoch is hiring an ML Distributed Sys­tems Se­nior Researcher

24 Nov 2023 22:33 UTC
2 points
0 comments4 min readLW link
(careers.rethinkpriorities.org)

Ar­ti­cle Dis­cus­sion And Free Pizza—St Paul

25Hour24 Nov 2023 21:02 UTC
1 point
0 comments1 min readLW link

Why fo­cus on schemers in par­tic­u­lar (Sec­tions 1.3 and 1.4 of “Schem­ing AIs”)

Joe Carlsmith24 Nov 2023 19:18 UTC
8 points
0 comments22 min readLW link

Sur­viv­ing and Shap­ing Long-Term Com­pe­ti­tions: Les­sons from Net Assessment

24 Nov 2023 18:18 UTC
5 points
0 comments13 min readLW link

Abil­ity to solve long-hori­zon tasks cor­re­lates with want­ing things in the be­hav­iorist sense

So8res24 Nov 2023 17:37 UTC
206 points
83 comments5 min readLW link

The Limi­ta­tions of GPT-4

p.b.24 Nov 2023 15:30 UTC
27 points
12 comments4 min readLW link

Progress links di­gest, 2023-11-24: Bot­tle­necks of ag­ing, Star­ship launches, and much more

jasoncrawford24 Nov 2023 15:25 UTC
40 points
1 comment14 min readLW link
(rootsofprogress.org)

[Question] What’s the ev­i­dence that LLMs will scale up effi­ciently be­yond GPT4? i.e. couldn’t GPT5, etc., be very in­effi­cient?

M. Y. Zuo24 Nov 2023 15:22 UTC
9 points
6 comments1 min readLW link

Sapi­ence, un­der­stand­ing, and “AGI”

Seth Herd24 Nov 2023 15:13 UTC
15 points
3 comments6 min readLW link

In­su­late your ideas

Logan Kieller24 Nov 2023 14:08 UTC
18 points
5 comments2 min readLW link
(logankieller.substack.com)

Bordeaux, Gironde, France – ir­reg­u­lar ACX Meetup 2023-12-09

vi21maobk9vp24 Nov 2023 11:17 UTC
5 points
1 comment1 min readLW link

[Question] A Ques­tion For Peo­ple Who Believe In God

yanni kyriacos24 Nov 2023 5:22 UTC
3 points
38 comments1 min readLW link

[Question] First and Last Ques­tions for GPT-5*

Mitchell_Porter24 Nov 2023 5:03 UTC
15 points
5 comments1 min readLW link

4. A Mo­ral Case for Evolved-Sapi­ence-Chau­vinism

RogerDearnaley24 Nov 2023 4:56 UTC
10 points
0 comments4 min readLW link

De­tect­ing What’s Been Seen

jefftk24 Nov 2023 3:30 UTC
23 points
0 comments2 min readLW link
(www.jefftk.com)

[Question] Help to find a blog I don’t re­mem­ber the name of

JavierCC23 Nov 2023 22:49 UTC
3 points
2 comments1 min readLW link

[Question] What did you change your mind about in the last year?

mike_hawke23 Nov 2023 20:53 UTC
41 points
16 comments1 min readLW link

A few Su­per­hu­man ex­am­ples of Su­per­al­igned Su­per­in­tel­li­gence from Google Bard (Thanks­giv­ing 2023)

23 Nov 2023 19:06 UTC
−9 points
1 comment17 min readLW link

Preps­giv­ing, A Con­ver­gently In­stru­men­tal Hu­man Practice

JenniferRM23 Nov 2023 17:24 UTC
39 points
0 comments7 min readLW link

AI #39: The Week of OpenAI

Zvi23 Nov 2023 15:10 UTC
67 points
8 comments28 min readLW link
(thezvi.wordpress.com)

3. Uploading

RogerDearnaley23 Nov 2023 7:39 UTC
21 points
5 comments8 min readLW link

2. AIs as Eco­nomic Agents

RogerDearnaley23 Nov 2023 7:07 UTC
9 points
2 comments6 min readLW link

Thomas Kwa’s re­search journal

23 Nov 2023 5:11 UTC
79 points
1 comment6 min readLW link

Never Drop A Ball

Screwtape23 Nov 2023 4:15 UTC
62 points
1 comment6 min readLW link

Pos­si­ble OpenAI’s Q* break­through and Deep­Mind’s AlphaGo-type sys­tems plus LLMs

Burny23 Nov 2023 3:16 UTC
37 points
25 comments2 min readLW link

Bos­ton Sec­u­lar Sols­tice: Call for Singers and Musicans

jefftk23 Nov 2023 2:40 UTC
16 points
2 comments1 min readLW link
(www.jefftk.com)

My Men­tal Model of Infohazards

MadHatter23 Nov 2023 2:37 UTC
7 points
33 comments2 min readLW link

Sat­u­rat­ing the Difficulty Levels of Alignment

Johannes C. Mayer23 Nov 2023 0:39 UTC
6 points
0 comments2 min readLW link

Sacra­mento LW/​ACX Meetup

mcint22 Nov 2023 23:52 UTC
1 point
0 comments1 min readLW link

Sam Alt­man’s ouster at OpenAI was pre­cip­i­tated by let­ter to board about AI break­through—Reuters

Jonathan Yan22 Nov 2023 23:17 UTC
18 points
11 comments1 min readLW link
(www.reuters.com)

Fore­sight In­sti­tute: 2023 Progress & 2024 Plans for fund­ing benefi­cial tech­nol­ogy development

Allison Duettmann22 Nov 2023 22:09 UTC
24 points
1 comment6 min readLW link

AISC pro­ject: TinyEvals

Jett Janiak22 Nov 2023 20:47 UTC
22 points
0 comments4 min readLW link

The pro­posal to add a ``Last Judge″ to an AI, does not re­move the ur­gency, of mak­ing progress on the ``what al­ign­ment tar­get should be aimed at?″ ques­tion.

ThomasCederborg22 Nov 2023 18:59 UTC
1 point
0 comments18 min readLW link

Nei­ther Coper­ni­cus, Gal­ileo, nor Ke­pler had proof

Meow P22 Nov 2023 18:41 UTC
4 points
10 comments1 min readLW link
(www.cricetuscricetus.co.uk)

So you want to save the world? An ac­count in paladinhood

Tamsin Leake22 Nov 2023 17:40 UTC
65 points
19 comments15 min readLW link
(carado.moe)

OpenAI: The Bat­tle of the Board

Zvi22 Nov 2023 17:30 UTC
281 points
83 comments11 min readLW link
(thezvi.wordpress.com)

Alt­man re­turns as OpenAI CEO with new board

Seth Herd22 Nov 2023 16:04 UTC
6 points
3 comments1 min readLW link

A tax­on­omy of non-schemer mod­els (Sec­tion 1.2 of “Schem­ing AIs”)

Joe Carlsmith22 Nov 2023 15:24 UTC
13 points
0 comments13 min readLW link