So­cial Balance through Em­brac­ing So­cial Credit

dhruvv26 Jul 2023 20:07 UTC
−39 points
9 comments3 min readLW link

Why no Ro­man In­dus­trial Revolu­tion?

jasoncrawford26 Jul 2023 19:34 UTC
62 points
30 comments3 min readLW link
(rootsofprogress.org)

Why you can’t treat de­cid­abil­ity and com­plex­ity as a con­stant (Post #1)

Noosphere8926 Jul 2023 17:54 UTC
6 points
13 comments5 min readLW link

A re­sponse to the Richards et al.’s “The Illu­sion of AI’s Ex­is­ten­tial Risk”

Harrison Fell26 Jul 2023 17:34 UTC
1 point
0 comments10 min readLW link

Meta-level ad­ver­sar­ial eval­u­a­tion of over­sight tech­niques might al­low ro­bust mea­sure­ment of their adequacy

26 Jul 2023 17:02 UTC
96 points
19 comments1 min readLW link1 review

Neuronpedia

Johnny Lin26 Jul 2023 16:29 UTC
135 points
51 comments2 min readLW link
(neuronpedia.org)

Fron­tier Model Forum

Zach Stein-Perlman26 Jul 2023 14:30 UTC
27 points
0 comments4 min readLW link
(blog.google)

Pod­casts: Fu­ture of Life In­sti­tute, Break­through Science Sum­mit panel

jasoncrawford26 Jul 2023 14:28 UTC
8 points
0 comments1 min readLW link
(rootsofprogress.org)

Llama We Do­ing This Again?

Zvi26 Jul 2023 13:00 UTC
48 points
3 comments16 min readLW link
(thezvi.wordpress.com)

Fron­tier Model Security

Vaniver26 Jul 2023 4:48 UTC
32 points
1 comment3 min readLW link
(www.anthropic.com)

The First Room-Tem­per­a­ture Am­bi­ent-Pres­sure Superconductor

Annapurna26 Jul 2023 2:27 UTC
35 points
28 comments1 min readLW link
(arxiv.org)

Un­der­wa­ter Tor­ture Cham­bers: The Hor­ror Of Fish Farming

omnizoid26 Jul 2023 0:27 UTC
81 points
50 comments10 min readLW link1 review

Con­tra Alexan­der on the Bit­ter Les­son and IQ

Andrew Keenan Richardson26 Jul 2023 0:07 UTC
9 points
1 comment4 min readLW link
(mechanisticmind.com)

Over­com­ing the MWC

Mark Freed25 Jul 2023 17:31 UTC
3 points
0 comments3 min readLW link

Rus­sian par­li­a­men­tar­ian: let’s ban per­sonal com­put­ers and the Internet

RomanS25 Jul 2023 17:30 UTC
11 points
6 comments2 min readLW link

AISN #16: White House Se­cures Vol­un­tary Com­mit­ments from Lead­ing AI Labs and Les­sons from Oppenheimer

25 Jul 2023 16:58 UTC
6 points
0 comments6 min readLW link
(newsletter.safe.ai)

“The Uni­verse of Minds”—call for re­view­ers (Seeds of Science)

rogersbacon25 Jul 2023 16:53 UTC
7 points
0 comments1 min readLW link

Thoughts on Loss Land­scapes and why Deep Learn­ing works

beren25 Jul 2023 16:41 UTC
53 points
4 comments18 min readLW link

Should you work at a lead­ing AI lab? (in­clud­ing in non-safety roles)

Benjamin Hilton25 Jul 2023 16:29 UTC
7 points
0 comments12 min readLW link

Whisper’s Word-Level Times­tamps are Out

Varshul Gupta25 Jul 2023 14:32 UTC
−18 points
2 comments2 min readLW link
(dubverseblack.substack.com)

AIS 101: Task de­com­po­si­tion for scal­able oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC
27 points
0 comments19 min readLW link
(docs.google.com)

An­thropic Observations

Zvi25 Jul 2023 12:50 UTC
104 points
1 comment10 min readLW link
(thezvi.wordpress.com)

Au­tonomous Align­ment Over­sight Frame­work (AAOF)

Justausername25 Jul 2023 10:25 UTC
−9 points
0 comments4 min readLW link

How LLMs are and are not myopic

janus25 Jul 2023 2:19 UTC
134 points
16 comments8 min readLW link

Se­cure Hand Holding

jefftk25 Jul 2023 1:40 UTC
28 points
43 comments1 min readLW link
(www.jefftk.com)

Open prob­lems in ac­ti­va­tion engineering

24 Jul 2023 19:46 UTC
51 points
2 comments1 min readLW link
(coda.io)

Sub­di­vi­sions for Use­ful Distil­la­tions?

Sharat Jacob Jacob24 Jul 2023 18:55 UTC
8 points
2 comments2 min readLW link

Op­ti­miz­ing For Ap­proval And Disapproval

Thoth Hermes24 Jul 2023 18:46 UTC
−1 points
0 comments12 min readLW link
(thothhermes.substack.com)

An Opinionated Guide to Com­putabil­ity and Com­plex­ity (Post #0)

Noosphere8924 Jul 2023 17:53 UTC
10 points
10 comments3 min readLW link

Slow­ing down AI progress is an un­der­ex­plored al­ign­ment strategy

Norman Borlaug24 Jul 2023 16:56 UTC
42 points
27 comments5 min readLW link

An­ti­ci­pa­tion in LLMs

derek shiller24 Jul 2023 15:53 UTC
6 points
0 comments13 min readLW link

The cone of free­dom (or, free­dom might only be in­stru­men­tally valuable)

dkl924 Jul 2023 15:38 UTC
−10 points
6 comments2 min readLW link
(dkl9.net)

A re­for­mu­la­tion of Finite Fac­tored Sets

Matthias G. Mayer24 Jul 2023 13:02 UTC
76 points
1 comment8 min readLW link

Brain Effi­ciency Can­nell Prize Con­test Award Ceremony

Alexander Gietelink Oldenziel24 Jul 2023 11:30 UTC
145 points
12 comments7 min readLW link

[Cross­post] An AI Pause Is Hu­man­ity’s Best Bet For Prevent­ing Ex­tinc­tion (TIME)

otto.barten24 Jul 2023 10:07 UTC
12 points
0 comments7 min readLW link
(time.com)

Cry­on­ics and Regret

MvB24 Jul 2023 9:16 UTC
187 points
35 comments2 min readLW link1 review

Ra­tion­al­ity !== Winning

Raemon24 Jul 2023 2:53 UTC
163 points
51 comments4 min readLW link

[Question] Which ra­tio­nal­ity posts are beg­ging for fur­ther prac­ti­cal de­vel­op­ment?

LoganStrohl23 Jul 2023 22:22 UTC
60 points
17 comments1 min readLW link

Please speak unpredictably

dkl923 Jul 2023 22:09 UTC
10 points
16 comments1 min readLW link
(dkl9.net)

QAPR 5: grokking is maybe not *that* big a deal?

Quintin Pope23 Jul 2023 20:14 UTC
114 points
15 comments9 min readLW link

My fa­vorite AI gov­er­nance re­search this year so far

Zach Stein-Perlman23 Jul 2023 16:30 UTC
26 points
1 comment7 min readLW link
(blog.aiimpacts.org)

“Jus­tice, Cher­ryl.”

Zack_M_Davis23 Jul 2023 16:16 UTC
85 points
21 comments9 min readLW link1 review

Sup­ple­men­tary Align­ment In­sights Through a Highly Con­trol­led Shut­down Incentive

Justausername23 Jul 2023 16:08 UTC
4 points
1 comment3 min readLW link

Au­to­g­y­nephilia dis­course is so ab­surdly bad on all sides

tailcalled23 Jul 2023 13:12 UTC
44 points
24 comments2 min readLW link

Ex­am­ples of Prompts that Make GPT-4 Out­put Falsehoods

22 Jul 2023 20:21 UTC
21 points
5 comments6 min readLW link

Think like a con­sul­tant not a salesperson

Adam Zerner22 Jul 2023 19:31 UTC
16 points
5 comments2 min readLW link

Op­ti­miza­tion, loss set at var­i­ance in RL

Clairstan22 Jul 2023 18:25 UTC
1 point
1 comment3 min readLW link

Com­pute Thresh­olds: pro­posed rules to miti­gate risk of a “lab leak” ac­ci­dent dur­ing AI train­ing runs

davidad22 Jul 2023 18:09 UTC
80 points
2 comments2 min readLW link

Apollo Neuro Fol­low Up

Elizabeth22 Jul 2023 17:20 UTC
28 points
0 comments1 min readLW link
(acesounderglass.com)

Ex­pert trap – Ways out (Part 3 of 3)

Paweł Sysiak22 Jul 2023 13:06 UTC
4 points
0 comments9 min readLW link