The sling­shot helps with learning

Wilson Wu31 Oct 2024 23:18 UTC
33 points
0 comments8 min readLW link

Toward Safety Case In­spired Ba­sic Research

31 Oct 2024 23:06 UTC
55 points
3 comments13 min readLW link

Spooky Recom­men­da­tion Sys­tem Scaling

phdead31 Oct 2024 22:00 UTC
11 points
0 comments4 min readLW link

‘Meta’, ‘mesa’, and mountains

Lorec31 Oct 2024 17:25 UTC
1 point
0 comments3 min readLW link

Toward Safety Cases For AI Scheming

31 Oct 2024 17:20 UTC
60 points
1 comment2 min readLW link

AI #88: Thanks for the Memos

Zvi31 Oct 2024 15:00 UTC
46 points
5 comments77 min readLW link
(thezvi.wordpress.com)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

31 Oct 2024 12:01 UTC
193 points
52 comments2 min readLW link
(www.thecompendium.ai)

Some Pre­limi­nary Notes on the Promise of a Wis­dom Explosion

Chris_Leong31 Oct 2024 9:21 UTC
2 points
0 comments1 min readLW link
(aiimpacts.org)

What TMS is like

Sable31 Oct 2024 0:44 UTC
206 points
23 comments6 min readLW link
(affablyevil.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Oc­to­ber ’24

gasteigerjo31 Oct 2024 0:09 UTC
3 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

Stan­dard SAEs Might Be In­co­her­ent: A Choos­ing Prob­lem & A “Con­cise” Solution

Kola Ayonrinde30 Oct 2024 22:50 UTC
27 points
0 comments12 min readLW link

Generic ad­vice caveats

Saul Munn30 Oct 2024 21:03 UTC
27 points
1 comment3 min readLW link
(www.brasstacks.blog)

I turned de­ci­sion the­ory prob­lems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC
104 points
23 comments1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoiner30 Oct 2024 18:31 UTC
46 points
8 comments4 min readLW link

[Question] How might lan­guage in­fluence how an AI “thinks”?

bodry30 Oct 2024 17:41 UTC
3 points
0 comments1 min readLW link

Mo­ti­va­tion control

Joe Carlsmith30 Oct 2024 17:15 UTC
45 points
7 comments52 min readLW link

Up­dat­ing the NAO Simulator

jefftk30 Oct 2024 13:50 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

Oc­cu­pa­tional Li­cens­ing Roundup #1

Zvi30 Oct 2024 11:00 UTC
65 points
11 comments11 min readLW link
(thezvi.wordpress.com)

Three No­tions of “Power”

johnswentworth30 Oct 2024 6:10 UTC
89 points
44 comments4 min readLW link

In­tro­duc­tion to Choice set Misspeci­fi­ca­tion in Re­ward In­fer­ence

Rahul Chand29 Oct 2024 22:57 UTC
1 point
0 comments8 min readLW link

Gothen­burg LW/​ACX meetup

Stefan29 Oct 2024 20:40 UTC
2 points
0 comments1 min readLW link

The Align­ment Trap: AI Safety as Path to Power

crispweed29 Oct 2024 15:21 UTC
57 points
17 comments5 min readLW link
(upcoder.com)

Hous­ing Roundup #10

Zvi29 Oct 2024 13:50 UTC
32 points
2 comments32 min readLW link
(thezvi.wordpress.com)

[In­tu­itive self-mod­els] 7. Hear­ing Voices, and Other Hallucinations

Steven Byrnes29 Oct 2024 13:36 UTC
51 points
2 comments16 min readLW link

Re­view: “The Case Against Real­ity”

David Gross29 Oct 2024 13:13 UTC
19 points
9 comments5 min readLW link

A Poem Is All You Need: Jailbreak­ing ChatGPT, Meta & More

Sharat Jacob Jacob29 Oct 2024 12:41 UTC
12 points
0 comments9 min readLW link

Search­ing for phe­nom­e­nal con­scious­ness in LLMs: Per­cep­tual re­al­ity mon­i­tor­ing and in­tro­spec­tive confidence

EuanMcLean29 Oct 2024 12:16 UTC
36 points
8 comments26 min readLW link

AI #87: Stay­ing in Character

Zvi29 Oct 2024 7:10 UTC
57 points
3 comments33 min readLW link
(thezvi.wordpress.com)

A path to hu­man autonomy

Nathan Helm-Burger29 Oct 2024 3:02 UTC
53 points
16 comments20 min readLW link

D&D.Sci Coli­seum: Arena of Data Eval­u­a­tion and Ruleset

aphyer29 Oct 2024 1:21 UTC
47 points
13 comments6 min readLW link

Gw­ern: Why So Few Matt Lev­ines?

kave29 Oct 2024 1:07 UTC
78 points
10 comments1 min readLW link
(gwern.net)

Oc­to­ber 2024 Progress in Guaran­teed Safe AI

Quinn28 Oct 2024 23:34 UTC
7 points
0 comments1 min readLW link
(gsai.substack.com)

5 home­grown EA pro­jects, seek­ing small donors

Austin Chen28 Oct 2024 23:24 UTC
85 points
4 comments1 min readLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe Carlsmith28 Oct 2024 21:57 UTC
54 points
5 comments32 min readLW link

En­hanc­ing Math­e­mat­i­cal Model­ing with LLMs: Goals, Challenges, and Evaluations

ozziegooen28 Oct 2024 21:44 UTC
7 points
0 comments1 min readLW link

AI & wis­dom 3: AI effects on amor­tised optimisation

L Rudolf L28 Oct 2024 21:08 UTC
18 points
0 comments14 min readLW link
(rudolf.website)

AI & wis­dom 2: growth and amor­tised optimisation

L Rudolf L28 Oct 2024 21:07 UTC
18 points
0 comments8 min readLW link
(rudolf.website)

AI & wis­dom 1: wis­dom, amor­tised op­ti­mi­sa­tion, and AI

L Rudolf L28 Oct 2024 21:02 UTC
29 points
0 comments15 min readLW link
(rudolf.website)

Finish­ing The SB-1047 Doc­u­men­tary In 6 Weeks

Michaël Trazzi28 Oct 2024 20:17 UTC
94 points
5 comments4 min readLW link
(manifund.org)

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane Ruthenis28 Oct 2024 19:45 UTC
20 points
2 comments33 min readLW link
(aiimpacts.org)

Quan­ti­ta­tive Trad­ing Boot­camp [Nov 6-10]

Ricki Heicklen28 Oct 2024 18:39 UTC
7 points
0 comments1 min readLW link

Win­ners of the Es­say com­pe­ti­tion on the Au­toma­tion of Wis­dom and Philosophy

28 Oct 2024 17:10 UTC
40 points
3 comments30 min readLW link
(blog.aiimpacts.org)

Miles Brundage: Find­ing Ways to Cred­ibly Sig­nal the Benign­ness of AI Devel­op­ment and De­ploy­ment is an Ur­gent Priority

Zach Stein-Perlman28 Oct 2024 17:00 UTC
22 points
4 comments3 min readLW link
(milesbrundage.substack.com)

[Question] some­body ex­plain the word “epistemic” to me

KvmanThinking28 Oct 2024 16:40 UTC
7 points
8 comments1 min readLW link

~80 In­ter­est­ing Ques­tions about Foun­da­tion Model Agent Safety

28 Oct 2024 16:37 UTC
46 points
4 comments15 min readLW link

AI Safety Newslet­ter #43: White House Is­sues First Na­tional Se­cu­rity Memo on AI Plus, AI and Job Dis­place­ment, and AI Takes Over the Nobels

28 Oct 2024 16:03 UTC
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

Death notes − 7 thoughts on death

Nathan Young28 Oct 2024 15:01 UTC
26 points
1 comment5 min readLW link
(nathanpmyoung.substack.com)

SAEs you can See: Ap­ply­ing Sparse Au­toen­coders to Clustering

Robert_AIZI28 Oct 2024 14:48 UTC
27 points
0 comments10 min readLW link

Bridg­ing the VLM and mech in­terp com­mu­ni­ties for mul­ti­modal in­ter­pretabil­ity

Sonia Joseph28 Oct 2024 14:41 UTC
19 points
5 comments15 min readLW link

How Likely Are Var­i­ous Pre­cur­sors of Ex­is­ten­tial Risk?

NunoSempere28 Oct 2024 13:27 UTC
55 points
4 comments15 min readLW link
(blog.sentinel-team.org)