Stan­dard SAEs Might Be In­co­her­ent: A Choos­ing Prob­lem & A “Con­cise” Solution

Kola Ayonrinde30 Oct 2024 22:50 UTC
27 points
0 comments12 min readLW link

Generic ad­vice caveats

Saul Munn30 Oct 2024 21:03 UTC
27 points
1 comment3 min readLW link
(www.brasstacks.blog)

I turned de­ci­sion the­ory prob­lems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC
103 points
20 comments1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoiner30 Oct 2024 18:31 UTC
46 points
8 comments4 min readLW link

[Question] How might lan­guage in­fluence how an AI “thinks”?

bodry30 Oct 2024 17:41 UTC
3 points
0 comments1 min readLW link

Mo­ti­va­tion control

Joe Carlsmith30 Oct 2024 17:15 UTC
45 points
7 comments52 min readLW link

Up­dat­ing the NAO Simulator

jefftk30 Oct 2024 13:50 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

Oc­cu­pa­tional Li­cens­ing Roundup #1

Zvi30 Oct 2024 11:00 UTC
65 points
11 comments11 min readLW link
(thezvi.wordpress.com)

Three No­tions of “Power”

johnswentworth30 Oct 2024 6:10 UTC
89 points
43 comments4 min readLW link

In­tro­duc­tion to Choice set Misspeci­fi­ca­tion in Re­ward In­fer­ence

Rahul Chand29 Oct 2024 22:57 UTC
1 point
0 comments8 min readLW link

Gothen­burg LW/​ACX meetup

Stefan29 Oct 2024 20:40 UTC
2 points
0 comments1 min readLW link

The Align­ment Trap: AI Safety as Path to Power

crispweed29 Oct 2024 15:21 UTC
57 points
17 comments5 min readLW link
(upcoder.com)

Hous­ing Roundup #10

Zvi29 Oct 2024 13:50 UTC
32 points
2 comments32 min readLW link
(thezvi.wordpress.com)

[In­tu­itive self-mod­els] 7. Hear­ing Voices, and Other Hallucinations

Steven Byrnes29 Oct 2024 13:36 UTC
50 points
2 comments16 min readLW link

Re­view: “The Case Against Real­ity”

David Gross29 Oct 2024 13:13 UTC
19 points
9 comments5 min readLW link

A Poem Is All You Need: Jailbreak­ing ChatGPT, Meta & More

Sharat Jacob Jacob29 Oct 2024 12:41 UTC
12 points
0 comments9 min readLW link

Search­ing for phe­nom­e­nal con­scious­ness in LLMs: Per­cep­tual re­al­ity mon­i­tor­ing and in­tro­spec­tive confidence

EuanMcLean29 Oct 2024 12:16 UTC
36 points
8 comments26 min readLW link

AI #87: Stay­ing in Character

Zvi29 Oct 2024 7:10 UTC
57 points
3 comments33 min readLW link
(thezvi.wordpress.com)

A path to hu­man autonomy

Nathan Helm-Burger29 Oct 2024 3:02 UTC
33 points
14 comments20 min readLW link

D&D.Sci Coli­seum: Arena of Data Eval­u­a­tion and Ruleset

aphyer29 Oct 2024 1:21 UTC
47 points
12 comments6 min readLW link

Gw­ern: Why So Few Matt Lev­ines?

kave29 Oct 2024 1:07 UTC
78 points
10 comments1 min readLW link
(gwern.net)

Oc­to­ber 2024 Progress in Guaran­teed Safe AI

Quinn28 Oct 2024 23:34 UTC
7 points
0 comments1 min readLW link
(gsai.substack.com)

5 home­grown EA pro­jects, seek­ing small donors

Austin Chen28 Oct 2024 23:24 UTC
85 points
4 comments1 min readLW link

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe Carlsmith28 Oct 2024 21:57 UTC
54 points
5 comments32 min readLW link

En­hanc­ing Math­e­mat­i­cal Model­ing with LLMs: Goals, Challenges, and Evaluations

ozziegooen28 Oct 2024 21:44 UTC
7 points
0 comments1 min readLW link

AI & wis­dom 3: AI effects on amor­tised optimisation

L Rudolf L28 Oct 2024 21:08 UTC
14 points
0 comments14 min readLW link
(rudolf.website)

AI & wis­dom 2: growth and amor­tised optimisation

L Rudolf L28 Oct 2024 21:07 UTC
18 points
0 comments8 min readLW link
(rudolf.website)

AI & wis­dom 1: wis­dom, amor­tised op­ti­mi­sa­tion, and AI

L Rudolf L28 Oct 2024 21:02 UTC
27 points
0 comments15 min readLW link
(rudolf.website)

Finish­ing The SB-1047 Doc­u­men­tary In 6 Weeks

Michaël Trazzi28 Oct 2024 20:17 UTC
93 points
5 comments4 min readLW link
(manifund.org)

Towards the Oper­a­tional­iza­tion of Philos­o­phy & Wisdom

Thane Ruthenis28 Oct 2024 19:45 UTC
20 points
2 comments33 min readLW link
(aiimpacts.org)

Quan­ti­ta­tive Trad­ing Boot­camp [Nov 6-10]

Ricki Heicklen28 Oct 2024 18:39 UTC
7 points
0 comments1 min readLW link

Win­ners of the Es­say com­pe­ti­tion on the Au­toma­tion of Wis­dom and Philosophy

28 Oct 2024 17:10 UTC
40 points
3 comments30 min readLW link
(blog.aiimpacts.org)

Miles Brundage: Find­ing Ways to Cred­ibly Sig­nal the Benign­ness of AI Devel­op­ment and De­ploy­ment is an Ur­gent Priority

Zach Stein-Perlman28 Oct 2024 17:00 UTC
22 points
4 comments3 min readLW link
(milesbrundage.substack.com)

[Question] some­body ex­plain the word “epistemic” to me

KvmanThinking28 Oct 2024 16:40 UTC
7 points
8 comments1 min readLW link

~80 In­ter­est­ing Ques­tions about Foun­da­tion Model Agent Safety

28 Oct 2024 16:37 UTC
45 points
4 comments15 min readLW link

AI Safety Newslet­ter #43: White House Is­sues First Na­tional Se­cu­rity Memo on AI Plus, AI and Job Dis­place­ment, and AI Takes Over the Nobels

28 Oct 2024 16:03 UTC
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

Death notes − 7 thoughts on death

Nathan Young28 Oct 2024 15:01 UTC
26 points
1 comment5 min readLW link
(nathanpmyoung.substack.com)

SAEs you can See: Ap­ply­ing Sparse Au­toen­coders to Clustering

Robert_AIZI28 Oct 2024 14:48 UTC
27 points
0 comments10 min readLW link

Bridg­ing the VLM and mech in­terp com­mu­ni­ties for mul­ti­modal in­ter­pretabil­ity

Sonia Joseph28 Oct 2024 14:41 UTC
19 points
5 comments15 min readLW link

How Likely Are Var­i­ous Pre­cur­sors of Ex­is­ten­tial Risk?

NunoSempere28 Oct 2024 13:27 UTC
54 points
4 comments15 min readLW link
(blog.sentinel-team.org)

Care Doesn’t Scale

stavros28 Oct 2024 11:57 UTC
27 points
1 comment1 min readLW link
(stevenscrawls.com)

Your mem­ory even­tu­ally drives con­fi­dence in each hy­poth­e­sis to 1 or 0

Crazy philosopher28 Oct 2024 9:00 UTC
3 points
6 comments1 min readLW link

San Fran­cisco ACX Meetup “First Satur­day”

Nate Sternberg28 Oct 2024 5:05 UTC
3 points
0 comments1 min readLW link

Nerdtri­tion: sim­ple diets via spread­sheet abuse

dkl927 Oct 2024 21:45 UTC
8 points
0 comments3 min readLW link
(dkl9.net)

AGI Fermi Paradox

jrincayc27 Oct 2024 20:14 UTC
0 points
2 comments2 min readLW link

Sub­sti­tut­ing Talk­box for Breath Controller

jefftk27 Oct 2024 19:10 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

Open Source Repli­ca­tion of An­thropic’s Cross­coder pa­per for model-diffing

27 Oct 2024 18:46 UTC
39 points
4 comments5 min readLW link

Hiring a writer to co-au­thor with me (Spencer Green­berg for Clear­erThink­ing.org)

spencerg27 Oct 2024 17:34 UTC
16 points
0 comments1 min readLW link

In­ter­view with Bill O’Rourke—Rus­sian Cor­rup­tion, Putin, Ap­plied Ethics, and More

JohnGreer27 Oct 2024 17:11 UTC
3 points
0 comments6 min readLW link

On Shifgrethor

JustisMills27 Oct 2024 15:30 UTC
66 points
18 comments2 min readLW link
(justismills.substack.com)