Third-party test­ing as a key in­gre­di­ent of AI policy

Zac Hatfield-Dodds25 Mar 2024 22:40 UTC
11 points
1 comment12 min readLW link
(www.anthropic.com)

Idea: Safe Fal­lback Reg­u­la­tions for Widely De­ployed AI Systems

Aaron_Scher25 Mar 2024 21:27 UTC
4 points
0 comments6 min readLW link

An­nounc­ing Neu­ron­pe­dia: Plat­form for ac­cel­er­at­ing re­search into Sparse Autoencoders

25 Mar 2024 21:17 UTC
92 points
7 comments7 min readLW link

Test­ing ChatGPT for cell type recognition

Metacelsus25 Mar 2024 19:59 UTC
7 points
2 comments3 min readLW link
(denovo.substack.com)

Should ra­tio­nal­ists be spiritual /​ Spiritu­al­ity as over­com­ing delusion

25 Mar 2024 16:48 UTC
49 points
57 comments29 min readLW link

Photo Cu­ra­tion Approach

jefftk25 Mar 2024 15:10 UTC
9 points
3 comments2 min readLW link
(www.jefftk.com)

On attunement

Joe Carlsmith25 Mar 2024 12:47 UTC
98 points
8 comments22 min readLW link

On Lex Frid­man’s Se­cond Pod­cast with Altman

Zvi25 Mar 2024 12:20 UTC
51 points
10 comments10 min readLW link
(thezvi.wordpress.com)

On the Con­fu­sion be­tween In­ner and Outer Misalignment

Chris_Leong25 Mar 2024 11:59 UTC
17 points
10 comments1 min readLW link

A Bit For You

Ronak_Mehta24 Mar 2024 22:18 UTC
0 points
0 comments2 min readLW link
(ronakrm.github.io)

All About Con­cave and Con­vex Agents

mako yass24 Mar 2024 21:37 UTC
63 points
23 comments8 min readLW link

Do not delete your mis­al­igned AGI.

mako yass24 Mar 2024 21:37 UTC
62 points
13 comments3 min readLW link

[Question] Define “Agent” (Embed­ded)

Apollonia24 Mar 2024 20:14 UTC
10 points
1 comment1 min readLW link

[Question] Could LLMs Help Gen­er­ate New Con­cepts in Hu­man Lan­guage?

Pekka Lampelto24 Mar 2024 20:13 UTC
10 points
4 comments2 min readLW link

Wittgen­stein and the Pri­vate Lan­guage Argument

TMFOW24 Mar 2024 20:06 UTC
4 points
0 comments14 min readLW link
(tmfow.substack.com)

Self-Play By Analogy

Amica Terra24 Mar 2024 20:06 UTC
−2 points
2 comments7 min readLW link

Can quan­tised au­toen­coders find and in­ter­pret cir­cuits in lan­guage mod­els?

charlieoneill24 Mar 2024 20:05 UTC
28 points
4 comments24 min readLW link

Man­dolin Harp Sen­sor Placement

jefftk24 Mar 2024 18:40 UTC
11 points
0 comments1 min readLW link
(www.jefftk.com)

AI Align­ment and the Clas­si­cal Hu­man­ist Tradition

PeteJ24 Mar 2024 13:37 UTC
−1 points
4 comments2 min readLW link

UNGA Re­s­olu­tion on AI: 5 Key Take­aways Look­ing to Fu­ture Policy

Heramb24 Mar 2024 12:23 UTC
3 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

[Question] Are (Mo­tor)sports like F1 a good thing to cal­ibrate es­ti­mates against?

CstineSublime24 Mar 2024 9:07 UTC
4 points
2 comments1 min readLW link

Nu­clear Quan­tum Im­mor­tal­ity Hack­ing

Nezek23 Mar 2024 22:08 UTC
−3 points
2 comments2 min readLW link

As Many Ideas

Screwtape23 Mar 2024 18:55 UTC
7 points
0 comments1 min readLW link

My De­tailed Notes & Com­men­tary from Sec­u­lar Solstice

Jeffrey Heninger23 Mar 2024 18:48 UTC
35 points
16 comments13 min readLW link

Gen­eral Thoughts on Sec­u­lar Solstice

Jeffrey Heninger23 Mar 2024 18:48 UTC
100 points
60 comments8 min readLW link

How to make food/​wa­ter test­ing cheaper/​more scal­able? [eg for pu­rity/​toxin test­ing]

Alex K. Chen (parrot)23 Mar 2024 5:28 UTC
9 points
2 comments1 min readLW link

Pro­to­typ­ing Pluck Sensors

jefftk23 Mar 2024 1:30 UTC
9 points
0 comments2 min readLW link
(www.jefftk.com)

Dangers of Closed-Loop AI

Gordon Seidoh Worley22 Mar 2024 23:52 UTC
35 points
9 comments2 min readLW link

Why The In­sects Scream

omnizoid22 Mar 2024 19:47 UTC
4 points
11 comments9 min readLW link

What does “au­to­di­dact” mean?

bhauth22 Mar 2024 18:37 UTC
22 points
19 comments1 min readLW link

[Linkpost] Vague Ver­biage in Forecasting

trevor22 Mar 2024 18:05 UTC
11 points
9 comments3 min readLW link
(goodjudgment.com)

Wolf and Rabbit

Richard Henage22 Mar 2024 17:20 UTC
14 points
4 comments1 min readLW link

AI Model Registries: A Reg­u­la­tory Review

22 Mar 2024 16:04 UTC
9 points
0 comments6 min readLW link

Video and tran­script of pre­sen­ta­tion on Schem­ing AIs

Joe Carlsmith22 Mar 2024 15:52 UTC
32 points
1 comment32 min readLW link

Bench­mark­ing LLM Agents on Kag­gle Competitions

aogara22 Mar 2024 13:09 UTC
15 points
4 comments5 min readLW link

Amer­i­can Ac­cel­er­a­tion vs Development

Maxwell Tabarrok22 Mar 2024 13:01 UTC
1 point
0 comments4 min readLW link
(www.maximum-progress.com)

Trans­for­ma­tive AI and Sce­nario Plan­ning for AI X-risk

22 Mar 2024 9:38 UTC
15 points
0 comments8 min readLW link

The Pyromaniacs

Ted Sanders22 Mar 2024 6:55 UTC
−3 points
1 comment2 min readLW link

Ver­nor Vinge, who coined the term “Tech­nolog­i­cal Sin­gu­lar­ity”, dies at 79

Kaj_Sotala21 Mar 2024 22:14 UTC
149 points
25 comments1 min readLW link
(arstechnica.com)

ChatGPT can learn in­di­rect control

Raymond D21 Mar 2024 21:11 UTC
213 points
27 comments1 min readLW link

“Deep Learn­ing” Is Func­tion Approximation

Zack_M_Davis21 Mar 2024 17:50 UTC
98 points
28 comments10 min readLW link
(zackmdavis.net)

A Teacher vs. Every­one Else

ronak6921 Mar 2024 17:45 UTC
41 points
8 comments2 min readLW link

Static vs Dy­namic Alignment

Gracie Green21 Mar 2024 17:44 UTC
5 points
0 comments29 min readLW link

On green

Joe Carlsmith21 Mar 2024 17:38 UTC
266 points
35 comments31 min readLW link

Com­par­ing Align­ment to other AGI in­ter­ven­tions: Ex­ten­sions and analysis

Martín Soto21 Mar 2024 17:30 UTC
7 points
0 comments4 min readLW link

The Com­cast Problem

RamblinDash21 Mar 2024 16:46 UTC
1 point
15 comments1 min readLW link

Vi­pas­sana Med­i­ta­tion and Ac­tive In­fer­ence: A Frame­work for Un­der­stand­ing Suffer­ing and its Cessation

Benjamin Sturgeon21 Mar 2024 12:32 UTC
50 points
8 comments19 min readLW link

AI #56: Black­well That Ends Well

Zvi21 Mar 2024 12:10 UTC
34 points
16 comments68 min readLW link
(thezvi.wordpress.com)

An Afford­able CO2 Monitor

Pretentious Penguin21 Mar 2024 3:06 UTC
28 points
1 comment1 min readLW link

Deep­Mind: Eval­u­at­ing Fron­tier Models for Danger­ous Capabilities

Zach Stein-Perlman21 Mar 2024 3:00 UTC
61 points
8 comments1 min readLW link
(arxiv.org)