Abla­tions for “Fron­tier Models are Ca­pable of In-con­text Schem­ing”

17 Dec 2024 23:58 UTC
108 points
1 comment2 min readLW link

bend­ing light

Recurrented17 Dec 2024 22:40 UTC
1 point
5 comments3 min readLW link
(futuring.substack.com)

Care­less think­ing: A the­ory of bad thinking

Nathan Young17 Dec 2024 18:23 UTC
44 points
17 comments9 min readLW link
(nathanpmyoung.substack.com)

The Se­cond Gemini

Zvi17 Dec 2024 15:50 UTC
23 points
0 comments11 min readLW link
(thezvi.wordpress.com)

AIS Hun­gary is hiring a part-time Tech­ni­cal Lead! (Dead­line: Dec 31st)

gergogaspar17 Dec 2024 14:12 UTC
1 point
0 comments2 min readLW link

Every­thing you care about is in the map

Tahp17 Dec 2024 14:05 UTC
17 points
27 comments3 min readLW link

Real­ity is Frac­tal-Shaped

silentbob17 Dec 2024 13:52 UTC
17 points
1 comment8 min readLW link

Try­ing to trans­late when peo­ple talk past each other

Kaj_Sotala17 Dec 2024 9:40 UTC
41 points
12 comments6 min readLW link
(kajsotala.fi)

What is “wire­head­ing”?

Vishakha17 Dec 2024 7:49 UTC
10 points
0 comments1 min readLW link
(aisafety.info)

3 What If We Could Map Our Mo­ti­va­tion as Chan­nels of Flow?

P. João17 Dec 2024 7:47 UTC
3 points
0 comments6 min readLW link

2 What if Life Comes with a Nat­u­ral Cal­ibra­tion to Es­ti­mate you?

P. João17 Dec 2024 7:47 UTC
1 point
0 comments10 min readLW link

1 What If We Re­build Mo­ti­va­tion with the Fermi ESTIMATion?

P. João17 Dec 2024 7:46 UTC
5 points
0 comments3 min readLW link

Where do you put your ideas?

CstineSublime17 Dec 2024 7:26 UTC
8 points
20 comments1 min readLW link

Ele­vat­ing Air Purifiers

jefftk17 Dec 2024 1:40 UTC
25 points
0 comments1 min readLW link
(www.jefftk.com)

0 Mo­ti­va­tion Map­ping through In­for­ma­tion Theory

P. João16 Dec 2024 23:17 UTC
8 points
0 comments28 min readLW link

A dataset of ques­tions on de­ci­sion-the­o­retic rea­son­ing in New­comb-like problems

16 Dec 2024 22:42 UTC
47 points
1 comment2 min readLW link
(arxiv.org)

A prac­ti­cal guide to tiling the uni­verse with hedonium

Vittu Perkele16 Dec 2024 21:25 UTC
−9 points
1 comment1 min readLW link
(perkeleperusing.substack.com)

AI Safety Seed Fund­ing Net­work—Join as a Donor or Investor

Alexandra Bos16 Dec 2024 19:30 UTC
30 points
0 comments1 min readLW link

Is this a bet­ter way to do match­mak­ing?

Chipmonk16 Dec 2024 19:06 UTC
9 points
4 comments1 min readLW link

I read ev­ery ma­jor AI lab’s safety plan so you don’t have to

sarahhw16 Dec 2024 18:51 UTC
20 points
0 comments12 min readLW link
(longerramblings.substack.com)

Grokking re­vis­ited: re­verse en­g­ineer­ing grokking mod­ulo ad­di­tion in LSTM

16 Dec 2024 18:48 UTC
4 points
0 comments6 min readLW link

[Question] Do in­finite al­ter­na­tives make AI al­ign­ment im­pos­si­ble?

Dakara16 Dec 2024 18:11 UTC
11 points
2 comments1 min readLW link

Progress links and short notes, 2024-12-16

jasoncrawford16 Dec 2024 17:24 UTC
7 points
0 comments2 min readLW link
(newsletter.rootsofprogress.org)

Effec­tive Altru­ism FAQ

omnizoid16 Dec 2024 16:27 UTC
0 points
7 comments12 min readLW link

Vari­ably com­press­ibly stud­ies are fun

dkl916 Dec 2024 16:00 UTC
0 points
0 comments2 min readLW link
(dkl9.net)

AIs Will In­creas­ingly At­tempt Shenanigans

Zvi16 Dec 2024 15:20 UTC
108 points
2 comments26 min readLW link
(thezvi.wordpress.com)

Test­ing which LLM ar­chi­tec­tures can do hid­den se­rial reasoning

Filip Sondej16 Dec 2024 13:48 UTC
80 points
9 comments4 min readLW link

Neu­roAI for AI safety: A Differ­en­tial Path

16 Dec 2024 13:17 UTC
14 points
0 comments7 min readLW link
(arxiv.org)

Cir­cling as prac­tice for “just be your­self”

Kaj_Sotala16 Dec 2024 7:40 UTC
86 points
5 comments4 min readLW link
(kajsotala.fi)

Re­an­a­lyz­ing the 2023 Ex­pert Sur­vey on Progress in AI

AI Impacts16 Dec 2024 6:10 UTC
8 points
0 comments1 min readLW link
(blog.aiimpacts.org)

Ideas for bench­mark­ing LLM creativity

gwern16 Dec 2024 5:18 UTC
53 points
11 comments1 min readLW link
(gwern.net)

Com­par­ing the AirFanta 3Pro to the Coway AP-1512

jefftk16 Dec 2024 1:40 UTC
13 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] are IQ tests a good mea­sure of in­tel­li­gence?

KvmanThinking15 Dec 2024 23:06 UTC
0 points
5 comments1 min readLW link

Madi­son Sec­u­lar Solstice

svfritz15 Dec 2024 21:52 UTC
1 point
0 comments1 min readLW link

[Question] Is AI al­ign­ment a purely func­tional prop­erty?

Roko15 Dec 2024 21:42 UTC
13 points
7 comments1 min readLW link

[Question] How coun­ter­fac­tual are log­i­cal coun­ter­fac­tu­als?

Donald Hobson15 Dec 2024 21:16 UTC
11 points
10 comments1 min readLW link

De­bunk­ing the myth of safe AI

henophilia15 Dec 2024 17:44 UTC
−11 points
7 comments1 min readLW link
(henophilia.substack.com)

In­tro­duc­ing Avatarism: A Ra­tional Frame­work for Build­ing ac­tual Heaven

ratiba ro15 Dec 2024 17:17 UTC
2 points
2 comments2 min readLW link

A Public Choice Take on Effec­tive Altruism

vaishnav9215 Dec 2024 16:58 UTC
9 points
4 comments3 min readLW link
(www.optimaloutliers.com)

World Models I’m Cur­rently Building

temporary15 Dec 2024 16:29 UTC
5 points
1 comment1 min readLW link
(samuelshadrach.com)

Dress Up For Sec­u­lar Solstice

Gordon H.S.15 Dec 2024 16:28 UTC
33 points
13 comments7 min readLW link

Remap your caps lock key

bilalchughtai15 Dec 2024 14:03 UTC
82 points
17 comments1 min readLW link

Effec­tive Evil’s AI Misal­ign­ment Plan

lsusr15 Dec 2024 7:39 UTC
78 points
9 comments3 min readLW link

Write Good Enough Code, Quickly

Oliver Daniels15 Dec 2024 4:45 UTC
19 points
10 comments8 min readLW link

How to Edit an Es­say into a Sols­tice Speech?

Czynski15 Dec 2024 4:30 UTC
5 points
1 comment1 min readLW link
(thepdv.wordpress.com)

How Your Phys­iol­ogy Affects the Mind’s Pro­jec­tion Fallacy

YanLyutnev14 Dec 2024 21:10 UTC
2 points
0 comments6 min readLW link

In­tro­duc­ing the Ev­i­dence Color Wheel

Larry Lee14 Dec 2024 16:08 UTC
6 points
0 comments3 min readLW link

An Illus­trated Sum­mary of “Ro­bust Agents Learn Causal World Model”

Dalcy14 Dec 2024 15:02 UTC
57 points
2 comments10 min readLW link

Best-of-N Jailbreaking

14 Dec 2024 4:58 UTC
78 points
6 comments2 min readLW link
(arxiv.org)

D&D.Sci Dun­geon­build­ing: the Dun­geon Tournament

aphyer14 Dec 2024 4:30 UTC
47 points
14 comments3 min readLW link