AlignedCut: Vi­sual Con­cepts Dis­cov­ery on Brain-Guided Univer­sal Fea­ture Space

Bogdan Ionut Cirstea14 Sep 2024 23:23 UTC
17 points
1 comment1 min readLW link
(arxiv.org)

How you can help pass im­por­tant AI leg­is­la­tion with 10 min­utes of effort

ThomasW14 Sep 2024 22:10 UTC
58 points
2 comments2 min readLW link

[Question] Cal­ibra­tion train­ing for ‘per­centile rank­ings’?

david reinstein14 Sep 2024 21:51 UTC
3 points
0 comments2 min readLW link

OpenAI o1, Llama 4, and AlphaZero of LLMs

Vladimir_Nesov14 Sep 2024 21:27 UTC
83 points
25 comments1 min readLW link

For­ever Leaders

Justice Howard14 Sep 2024 20:55 UTC
6 points
9 comments1 min readLW link

Emer­gent Author­ship: Creativity à la Communing

gswonk14 Sep 2024 19:02 UTC
1 point
0 comments3 min readLW link

Com­pres­sion Moves for Prediction

adamShimi14 Sep 2024 17:51 UTC
20 points
0 comments7 min readLW link
(epistemologicalfascinations.substack.com)

Pay-on-re­sults per­sonal growth: first success

Chipmonk14 Sep 2024 3:39 UTC
63 points
5 comments4 min readLW link
(chrislakin.blog)

Avoid­ing the Bog of Mo­ral Hazard for AI

Nathan Helm-Burger13 Sep 2024 21:24 UTC
17 points
13 comments2 min readLW link

[Question] If I ask an LLM to think step by step, how big are the steps?

ryan_b13 Sep 2024 20:30 UTC
7 points
1 comment1 min readLW link

Es­ti­mat­ing Tail Risk in Neu­ral Networks

Mark Xu13 Sep 2024 20:00 UTC
68 points
9 comments23 min readLW link
(www.alignment.org)

If-Then Com­mit­ments for AI Risk Re­duc­tion [by Holden Karnofsky]

habryka13 Sep 2024 19:38 UTC
28 points
0 comments20 min readLW link
(carnegieendowment.org)

Can star­tups be im­pact­ful in AI safety?

13 Sep 2024 19:00 UTC
12 points
0 comments6 min readLW link

I just can’t agree with AI safety. Why am I wrong?

Ya Polkovnik13 Sep 2024 17:48 UTC
0 points
5 comments2 min readLW link

Keep­ing it (less than) real: Against ℶ₂ pos­si­ble peo­ple or worlds

quiet_NaN13 Sep 2024 17:29 UTC
9 points
0 comments9 min readLW link

Why I’m bear­ish on mechanis­tic in­ter­pretabil­ity: the shards are not in the network

tailcalled13 Sep 2024 17:09 UTC
22 points
40 comments1 min readLW link

In­creas­ing the Span of the Set of Ideas

Jeffrey Heninger13 Sep 2024 15:52 UTC
6 points
1 comment9 min readLW link

How difficult is AI Align­ment?

Sammy Martin13 Sep 2024 15:47 UTC
43 points
6 comments23 min readLW link

The Great Data In­te­gra­tion Schlep

sarahconstantin13 Sep 2024 15:40 UTC
258 points
16 comments9 min readLW link
(sarahconstantin.substack.com)

“Real AGI”

Seth Herd13 Sep 2024 14:13 UTC
18 points
20 comments3 min readLW link

AI, cen­tral­iza­tion, and the One Ring

owencb13 Sep 2024 14:00 UTC
63 points
11 comments8 min readLW link
(strangecities.substack.com)

Ev­i­dence against Learned Search in a Chess-Play­ing Neu­ral Network

p.b.13 Sep 2024 11:59 UTC
56 points
3 comments6 min readLW link

My ca­reer ex­plo­ra­tion: Tools for build­ing confidence

lynettebye13 Sep 2024 11:37 UTC
17 points
0 comments20 min readLW link

Con­tra pa­pers claiming su­per­hu­man AI forecasting

12 Sep 2024 18:10 UTC
180 points
16 comments7 min readLW link

OpenAI o1

Zach Stein-Perlman12 Sep 2024 17:30 UTC
147 points
41 comments1 min readLW link

How to Give in to Threats (with­out in­cen­tiviz­ing them)

Mikhail Samin12 Sep 2024 15:55 UTC
52 points
26 comments5 min readLW link

Open Prob­lems in AIXI Agent Foundations

Cole Wyeth12 Sep 2024 15:38 UTC
41 points
2 comments10 min readLW link

On the de­struc­tion of Amer­ica’s best high school

Chris_Leong12 Sep 2024 15:30 UTC
−6 points
7 comments1 min readLW link
(scottaaronson.blog)

Op­ti­mis­ing un­der ar­bi­trar­ily many con­straint equations

dkl912 Sep 2024 14:59 UTC
6 points
0 comments3 min readLW link
(dkl9.net)

AI #81: Alpha Proteo

Zvi12 Sep 2024 13:00 UTC
59 points
3 comments35 min readLW link
(thezvi.wordpress.com)

[Question] When can I be nu­mer­ate?

FinalFormal212 Sep 2024 4:05 UTC
25 points
3 comments1 min readLW link

A Non­con­struc­tive Ex­is­tence Proof of Aligned Superintelligence

Roko12 Sep 2024 3:20 UTC
0 points
78 comments1 min readLW link
(transhumanaxiology.substack.com)

Col­laps­ing the Belief/​Knowl­edge Distinction

Jeremias11 Sep 2024 21:24 UTC
−7 points
8 comments1 min readLW link

Pro­gram­ming Re­fusal with Con­di­tional Ac­ti­va­tion Steering

Bruce W. Lee11 Sep 2024 20:57 UTC
41 points
0 comments11 min readLW link
(arxiv.org)

Check­ing pub­lic figures on whether they “an­swered the ques­tion” quick anal­y­sis from Har­ris/​Trump de­bate, and a proposal

david reinstein11 Sep 2024 20:25 UTC
7 points
4 comments1 min readLW link
(open.substack.com)

AI Safety Newslet­ter #41: The Next Gen­er­a­tion of Com­pute Scale Plus, Rank­ing Models by Sus­cep­ti­bil­ity to Jailbreak­ing, and Ma­chine Ethics

11 Sep 2024 19:14 UTC
5 points
1 comment5 min readLW link
(newsletter.safe.ai)

Re­fac­tor­ing cry­on­ics as struc­tural brain preservation

Andy_McKenzie11 Sep 2024 18:36 UTC
102 points
14 comments3 min readLW link

[Question] Is this a Pivotal Weak Act? Creat­ing bac­te­ria that de­com­pose metal

doomyeser11 Sep 2024 18:07 UTC
9 points
9 comments3 min readLW link

How to dis­cover the na­ture of sen­tience, and ethics

Gustavo Ramires11 Sep 2024 17:22 UTC
−2 points
4 comments5 min readLW link

Seek­ing Mechanism De­signer for Re­search into In­ter­nal­iz­ing Catas­trophic Externalities

c.trout11 Sep 2024 15:09 UTC
24 points
2 comments3 min readLW link

Could Things Be Very Differ­ent?—How His­tor­i­cal In­er­tia Might Blind Us To Op­ti­mal Solutions

James Stephen Brown11 Sep 2024 9:53 UTC
5 points
0 comments8 min readLW link
(nonzerosum.games)

Re­for­ma­tive Hypocrisy, and Pay­ing Close Enough At­ten­tion to Selec­tively Re­ward It.

Andrew_Critch11 Sep 2024 4:41 UTC
53 points
11 comments3 min readLW link

A nec­es­sary Mem­brane for­mal­ism feature

ThomasCederborg10 Sep 2024 21:33 UTC
20 points
6 comments11 min readLW link

For­mal­iz­ing the In­for­mal (event in­vite)

abramdemski10 Sep 2024 19:22 UTC
42 points
0 comments1 min readLW link

AI #80: Never Have I Ever

Zvi10 Sep 2024 17:50 UTC
45 points
20 comments39 min readLW link
(thezvi.wordpress.com)

The Best Lay Ar­gu­ment is not a Sim­ple English Yud Essay

J Bostock10 Sep 2024 17:34 UTC
247 points
15 comments5 min readLW link

Eco­nomics Roundup #3

Zvi10 Sep 2024 13:50 UTC
44 points
9 comments20 min readLW link
(thezvi.wordpress.com)

Am­plify is hiring! Work with us to sup­port field-build­ing ini­ti­a­tives through digi­tal marketing

gergogaspar10 Sep 2024 8:56 UTC
0 points
1 comment4 min readLW link

What boot­straps in­tel­li­gence?

invertedpassion10 Sep 2024 7:11 UTC
2 points
2 comments1 min readLW link

Phys­i­cal Ther­apy Sucks (but have you tried hid­ing it in some peanut but­ter?)

Declan Molony10 Sep 2024 5:54 UTC
16 points
12 comments2 min readLW link