What can we learn from in­se­cure do­mains?

Logan Zoellner1 Nov 2024 23:53 UTC
14 points
21 comments1 min readLW link

Science ad­vances one funeral at a time

1 Nov 2024 23:06 UTC
92 points
9 comments2 min readLW link

The Carte­sian Crisis

mindprison1 Nov 2024 23:02 UTC
−5 points
2 comments2 min readLW link

Com­po­si­tion Cir­cuits in Vi­sion Trans­form­ers (Hy­poth­e­sis)

phenomanon1 Nov 2024 22:16 UTC
1 point
0 comments3 min readLW link

SAE Prob­ing: What is it good for? Ab­solutely some­thing!

1 Nov 2024 19:23 UTC
31 points
0 comments11 min readLW link

[Question] Set The­ory Mul­ti­verse vs Math­e­mat­i­cal Truth—Philo­soph­i­cal Discussion

Wenitte Apiou1 Nov 2024 18:56 UTC
8 points
25 comments1 min readLW link

Ed­u­ca­tional CAI: Align­ing a Lan­guage Model with Ped­a­gog­i­cal Theories

Bharath Puranam1 Nov 2024 18:55 UTC
5 points
1 comment13 min readLW link

Pre­dic­tion mar­kets and Taxes

Edmund Nelson1 Nov 2024 17:39 UTC
10 points
7 comments1 min readLW link

Den­tistry, Oral Sur­geons, and the Ineffi­ciency of Small Markets

GeneSmith1 Nov 2024 17:26 UTC
76 points
16 comments5 min readLW link

Live Machin­ery: An In­ter­face De­sign Philos­o­phy for Whole­some AI Futures

Sahil1 Nov 2024 17:24 UTC
45 points
3 comments35 min readLW link

Seek­ing Collaborators

abramdemski1 Nov 2024 17:13 UTC
57 points
15 comments7 min readLW link

Com­plete Feedback

abramdemski1 Nov 2024 16:58 UTC
23 points
7 comments3 min readLW link

Lev­ers for Biolog­i­cal Progress—A Re­sponse to “Machines of Lov­ing Grace”

Niko_McCarty1 Nov 2024 16:35 UTC
15 points
0 comments20 min readLW link
(www.asimov.press)

2024 Unoffi­cial LW Com­mu­nity Cen­sus, Re­quest for Comments

Screwtape1 Nov 2024 16:34 UTC
23 points
32 comments3 min readLW link

[Question] When en­gag­ing with a large amount of re­sources dur­ing a liter­a­ture re­view, how do you pre­vent your­self from be­com­ing over­whelmed?

corruptedCatapillar1 Nov 2024 7:29 UTC
25 points
2 comments3 min readLW link

(draft) Cy­borg soft­ware should be open (?)

AtillaYasar1 Nov 2024 7:24 UTC
4 points
5 comments3 min readLW link

Another UFO Bet

codyz1 Nov 2024 1:55 UTC
6 points
11 comments1 min readLW link

Trad­ing Candy

jefftk1 Nov 2024 1:10 UTC
28 points
4 comments1 min readLW link
(www.jefftk.com)

Jar­gonBot Beta Test

Raemon1 Nov 2024 1:05 UTC
84 points
55 comments6 min readLW link

GPT-4o Guardrails Gone: Data Poi­son­ing & Jailbreak-Tuning

1 Nov 2024 0:10 UTC
18 points
0 comments6 min readLW link
(far.ai)

The sling­shot helps with learning

Wilson Wu31 Oct 2024 23:18 UTC
33 points
0 comments8 min readLW link

Toward Safety Case In­spired Ba­sic Research

31 Oct 2024 23:06 UTC
55 points
2 comments13 min readLW link

Spooky Recom­men­da­tion Sys­tem Scaling

phdead31 Oct 2024 22:00 UTC
11 points
0 comments4 min readLW link

‘Meta’, ‘mesa’, and mountains

Lorec31 Oct 2024 17:25 UTC
1 point
0 comments3 min readLW link

Toward Safety Cases For AI Scheming

31 Oct 2024 17:20 UTC
60 points
1 comment2 min readLW link

AI #88: Thanks for the Memos

Zvi31 Oct 2024 15:00 UTC
46 points
5 comments77 min readLW link
(thezvi.wordpress.com)

The Com­pendium, A full ar­gu­ment about ex­tinc­tion risk from AGI

31 Oct 2024 12:01 UTC
192 points
52 comments2 min readLW link
(www.thecompendium.ai)

Some Pre­limi­nary Notes on the Promise of a Wis­dom Explosion

Chris_Leong31 Oct 2024 9:21 UTC
2 points
0 comments1 min readLW link
(aiimpacts.org)

What TMS is like

Sable31 Oct 2024 0:44 UTC
207 points
23 comments6 min readLW link
(affablyevil.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Oc­to­ber ’24

gasteigerjo31 Oct 2024 0:09 UTC
3 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

Stan­dard SAEs Might Be In­co­her­ent: A Choos­ing Prob­lem & A “Con­cise” Solution

Kola Ayonrinde30 Oct 2024 22:50 UTC
27 points
0 comments12 min readLW link

Generic ad­vice caveats

Saul Munn30 Oct 2024 21:03 UTC
27 points
1 comment3 min readLW link
(www.brasstacks.blog)

I turned de­ci­sion the­ory prob­lems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC
103 points
20 comments1 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoiner30 Oct 2024 18:31 UTC
46 points
8 comments4 min readLW link

[Question] How might lan­guage in­fluence how an AI “thinks”?

bodry30 Oct 2024 17:41 UTC
3 points
0 comments1 min readLW link

Mo­ti­va­tion control

Joe Carlsmith30 Oct 2024 17:15 UTC
45 points
7 comments52 min readLW link

Up­dat­ing the NAO Simulator

jefftk30 Oct 2024 13:50 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

Oc­cu­pa­tional Li­cens­ing Roundup #1

Zvi30 Oct 2024 11:00 UTC
65 points
11 comments11 min readLW link
(thezvi.wordpress.com)

Three No­tions of “Power”

johnswentworth30 Oct 2024 6:10 UTC
89 points
43 comments4 min readLW link

In­tro­duc­tion to Choice set Misspeci­fi­ca­tion in Re­ward In­fer­ence

Rahul Chand29 Oct 2024 22:57 UTC
1 point
0 comments8 min readLW link

Gothen­burg LW/​ACX meetup

Stefan29 Oct 2024 20:40 UTC
2 points
0 comments1 min readLW link

The Align­ment Trap: AI Safety as Path to Power

crispweed29 Oct 2024 15:21 UTC
57 points
17 comments5 min readLW link
(upcoder.com)

Hous­ing Roundup #10

Zvi29 Oct 2024 13:50 UTC
32 points
2 comments32 min readLW link
(thezvi.wordpress.com)

[In­tu­itive self-mod­els] 7. Hear­ing Voices, and Other Hallucinations

Steven Byrnes29 Oct 2024 13:36 UTC
50 points
2 comments16 min readLW link

Re­view: “The Case Against Real­ity”

David Gross29 Oct 2024 13:13 UTC
19 points
9 comments5 min readLW link

A Poem Is All You Need: Jailbreak­ing ChatGPT, Meta & More

Sharat Jacob Jacob29 Oct 2024 12:41 UTC
12 points
0 comments9 min readLW link

Search­ing for phe­nom­e­nal con­scious­ness in LLMs: Per­cep­tual re­al­ity mon­i­tor­ing and in­tro­spec­tive confidence

EuanMcLean29 Oct 2024 12:16 UTC
36 points
8 comments26 min readLW link

AI #87: Stay­ing in Character

Zvi29 Oct 2024 7:10 UTC
57 points
3 comments33 min readLW link
(thezvi.wordpress.com)

A path to hu­man autonomy

Nathan Helm-Burger29 Oct 2024 3:02 UTC
33 points
14 comments20 min readLW link

D&D.Sci Coli­seum: Arena of Data Eval­u­a­tion and Ruleset

aphyer29 Oct 2024 1:21 UTC
47 points
12 comments6 min readLW link