Some mis­cel­la­neous thoughts on ChatGPT, sto­ries, and me­chan­i­cal interpretability

Bill Benzon4 Feb 2023 19:35 UTC
2 points
0 comments3 min readLW link

O(“AGI Safety”)>O(“Stop Tyrants”)

AnthonyRepetto4 Feb 2023 18:38 UTC
−4 points
11 comments1 min readLW link

Monthly Doom Ar­gu­ment Threads? Doom Ar­gu­ment Wiki?

LVSN4 Feb 2023 16:59 UTC
3 points
0 comments1 min readLW link

The Fu­ture of Struc­tured Self Improvement

Evenflair4 Feb 2023 16:02 UTC
27 points
4 comments1 min readLW link
(guildoftherose.org)

Em­pa­thy as a nat­u­ral con­se­quence of learnt re­ward models

beren4 Feb 2023 15:35 UTC
46 points
26 comments13 min readLW link

Mech In­terp Pro­ject Ad­vis­ing Call: Me­mori­sa­tion in GPT-2 Small

Neel Nanda4 Feb 2023 14:17 UTC
7 points
0 comments1 min readLW link

Do IQ tests mea­sure in­tel­li­gence? - A pre­dic­tion mar­ket on my fu­ture be­liefs about the topic

tailcalled4 Feb 2023 11:19 UTC
1 point
10 comments1 min readLW link
(manifold.markets)

AXRP Epi­sode 19 - Mechanis­tic In­ter­pretabil­ity with Neel Nanda

DanielFilan4 Feb 2023 3:00 UTC
45 points
0 comments117 min readLW link

The 2/​3 rule for multi-fac­tor authentication

RomanHauksson4 Feb 2023 2:57 UTC
4 points
0 comments1 min readLW link
(roman.computer)

Path-Depen­dence in ChatGPT’s Poli­ti­cal Outputs

lsusr4 Feb 2023 2:02 UTC
28 points
4 comments4 min readLW link

Fuck­ing God­damn Ba­sics of Ra­tion­al­ist Discourse

LoganStrohl4 Feb 2023 1:47 UTC
321 points
103 comments1 min readLW link3 reviews

Small Talk is Good, Actually

Gordon Seidoh Worley4 Feb 2023 0:38 UTC
51 points
9 comments3 min readLW link

Up­date on Book Re­view Dom­i­nant As­surance Contract

Arjun Panickssery3 Feb 2023 23:16 UTC
9 points
0 comments1 min readLW link

[Question] 2+2=π√2+n

Logan Zoellner3 Feb 2023 22:27 UTC
16 points
15 comments1 min readLW link

[Question] If I en­counter a ca­pa­bil­ities pa­per that kinda spooks me, what should I do with it?

the gears to ascension3 Feb 2023 21:37 UTC
28 points
8 comments1 min readLW link

[Question] What Are The Pre­con­di­tions/​Pr­ereq­ui­sites for Asymp­totic Anal­y­sis?

DragonGod3 Feb 2023 21:26 UTC
8 points
2 comments1 min readLW link

[Linkpost] Google in­vested $300M in An­thropic in late 2022

Akash3 Feb 2023 19:13 UTC
73 points
14 comments1 min readLW link
(www.ft.com)

Many AI gov­er­nance pro­pos­als have a trade­off be­tween use­ful­ness and feasibility

3 Feb 2023 18:49 UTC
22 points
2 comments2 min readLW link

Re­ply to Dun­can Sa­bien on Strawmanning

Zack_M_Davis3 Feb 2023 17:57 UTC
42 points
11 comments4 min readLW link

Semi-rare plain lan­guage words that are great to remember

LVSN3 Feb 2023 16:33 UTC
4 points
7 comments1 min readLW link

[Question] What qual­ities does an AGI need to have to re­al­ize the risk of false vac­uum, with­out hard­cod­ing physics the­o­ries into it?

RationalSieve3 Feb 2023 16:00 UTC
1 point
4 comments1 min readLW link

Hous­ing and Tran­sit Roundup #3

Zvi3 Feb 2023 15:10 UTC
21 points
6 comments16 min readLW link
(thezvi.wordpress.com)

Ta­boo P(doom)

NathanBarnard3 Feb 2023 10:37 UTC
14 points
10 comments1 min readLW link

ChatGPT: Tan­tal­iz­ing af­terthoughts in search of story tra­jec­to­ries [in­duc­tion heads]

Bill Benzon3 Feb 2023 10:35 UTC
4 points
0 comments20 min readLW link

Jor­dan Peter­son: Guru/​Villain

Bryan Frances3 Feb 2023 9:02 UTC
−14 points
6 comments9 min readLW link

[Question] What is the risk of ask­ing a coun­ter­fac­tual or­a­cle a ques­tion that already had its an­swer erased?

Chris_Leong3 Feb 2023 3:13 UTC
7 points
0 comments1 min readLW link

I don’t think MIRI “gave up”

Raemon3 Feb 2023 0:26 UTC
106 points
64 comments4 min readLW link

What fact that you know is true but most peo­ple aren’t ready to ac­cept it?

lorepieri3 Feb 2023 0:06 UTC
47 points
210 comments1 min readLW link

[Question] Monotonous Work

Gideon Bauer2 Feb 2023 21:35 UTC
1 point
0 comments1 min readLW link

Is AI risk as­sess­ment too an­thro­pocen­tric?

Craig Mattson2 Feb 2023 21:34 UTC
3 points
6 comments1 min readLW link

Hal­i­fax Monthly Meetup: In­tro­duc­tion to Effec­tive Altruism

Ideopunk2 Feb 2023 21:10 UTC
10 points
0 comments1 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Outer al­ign­ment via care­ful conditioning

2 Feb 2023 20:28 UTC
72 points
15 comments57 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Large lan­guage mod­els as predictors

2 Feb 2023 20:28 UTC
88 points
4 comments13 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermott2 Feb 2023 20:28 UTC
26 points
5 comments4 min readLW link

An­drew Hu­ber­man on How to Op­ti­mize Sleep

Leon Lang2 Feb 2023 20:17 UTC
37 points
6 comments6 min readLW link

[Question] How can I help in­flam­ma­tion-based nerve dam­age be tem­po­rary?

Optimization Process2 Feb 2023 19:20 UTC
17 points
4 comments1 min readLW link

More find­ings on max­i­mal data dimension

Marius Hobbhahn2 Feb 2023 18:33 UTC
27 points
1 comment11 min readLW link

Her­i­ta­bil­ity, Be­hav­iorism, and Within-Life­time RL

Steven Byrnes2 Feb 2023 16:34 UTC
39 points
3 comments4 min readLW link

Covid 2/​2/​23: The Emer­gency Ends on 5/​11

Zvi2 Feb 2023 14:00 UTC
22 points
6 comments7 min readLW link
(thezvi.wordpress.com)

You are prob­a­bly not a good al­ign­ment re­searcher, and other blatant lies

junk heap homotopy2 Feb 2023 13:55 UTC
83 points
16 comments2 min readLW link

Don’t Judge a Tool by its Aver­age Output

silentbob2 Feb 2023 13:42 UTC
11 points
2 comments4 min readLW link

Epoch Im­pact Re­port 2022

Jsevillamol2 Feb 2023 13:09 UTC
16 points
0 comments1 min readLW link

You Don’t Ex­ist, Duncan

Duncan Sabien (Deactivated)2 Feb 2023 8:37 UTC
247 points
107 comments9 min readLW link

Tem­po­rally Lay­ered Ar­chi­tec­ture for Adap­tive, Distributed and Con­tin­u­ous Control

Roman Leventov2 Feb 2023 6:29 UTC
6 points
4 comments1 min readLW link
(arxiv.org)

Re­search agenda: For­mal­iz­ing ab­strac­tions of computations

Erik Jenner2 Feb 2023 4:29 UTC
92 points
10 comments31 min readLW link

Progress links and tweets, 2023-02-01

jasoncrawford2 Feb 2023 2:25 UTC
10 points
0 comments1 min readLW link
(rootsofprogress.org)

Ret­ro­spec­tive on the AI Safety Field Build­ing Hub

Vael Gates2 Feb 2023 2:06 UTC
30 points
0 comments1 min readLW link

How to ex­port An­droid Chrome tabs to an HTML file in Linux (as of Fe­bru­ary 2023)

Adam Scherlis2 Feb 2023 2:03 UTC
7 points
3 comments2 min readLW link
(adam.scherlis.com)

Hacked Ac­count Spam

jefftk2 Feb 2023 1:50 UTC
13 points
5 comments1 min readLW link
(www.jefftk.com)

A sim­ple tech­nique to re­duce nega­tive rumination

cranberry_bear2 Feb 2023 1:33 UTC
9 points
0 comments1 min readLW link