RSS

WISDOMISM A Mo­ral The­ory for the Age of Information

Peter lawless 19 Apr 2024 23:06 UTC
4 points
0 comments9 min readLW link

In­duc­ing Un­prompted Misal­ign­ment in LLMs

19 Apr 2024 20:00 UTC
22 points
5 comments16 min readLW link

Progress Up­date #1 from the GDM Mech In­terp Team: Full Update

19 Apr 2024 19:06 UTC
36 points
4 comments8 min readLW link

Progress Up­date #1 from the GDM Mech In­terp Team: Summary

19 Apr 2024 19:06 UTC
32 points
0 comments3 min readLW link

[Question] What is the best way to talk about prob­a­bil­ities you ex­pect to change with ev­i­dence/​ex­per­i­ments?

Will_Pearson19 Apr 2024 15:35 UTC
12 points
9 comments1 min readLW link

CTMU in­sight: maybe con­scious­ness *can* af­fect quan­tum out­comes?

zhukeepa19 Apr 2024 15:23 UTC
12 points
3 comments4 min readLW link

[Question] How to Model the Fu­ture of Open-Source LLMs?

Joel Burget19 Apr 2024 14:28 UTC
10 points
1 comment1 min readLW link

What’s up with all the non-Mor­mons? Weirdly spe­cific uni­ver­sal­ities across LLMs

mwatkins19 Apr 2024 13:43 UTC
24 points
4 comments27 min readLW link

[Question] If digi­tal goods in vir­tual wor­lds in­crease GDP, do we ac­tu­ally be­come richer?

No77e19 Apr 2024 10:06 UTC
4 points
6 comments1 min readLW link

Ex­per­i­ment on re­peat­ing choices

KatjaGrace19 Apr 2024 4:20 UTC
50 points
1 comment3 min readLW link
(worldspiritsockpuppet.com)

Co­he­sion and busi­ness problems

Adam Zerner19 Apr 2024 0:45 UTC
10 points
2 comments4 min readLW link

The Ther­mo­dy­nam­ics of Death

Peter lawless 19 Apr 2024 0:36 UTC
4 points
0 comments10 min readLW link

hy­dro­gen tube transport

bhauth18 Apr 2024 22:47 UTC
26 points
7 comments5 min readLW link
(www.bhauth.com)

A Re­view of In-Con­text Learn­ing Hy­pothe­ses for Au­to­mated AI Align­ment Research

alamerton18 Apr 2024 18:29 UTC
20 points
4 comments15 min readLW link

Blessed in­for­ma­tion, garbage in­for­ma­tion, cursed information

tailcalled18 Apr 2024 16:56 UTC
20 points
5 comments3 min readLW link

[Fic­tion] A Confession

Arjun Panickssery18 Apr 2024 16:28 UTC
28 points
2 comments5 min readLW link
(arjunpanickssery.substack.com)

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam Marks18 Apr 2024 16:17 UTC
75 points
1 comment12 min readLW link

Co­op­er­a­tion is op­ti­mal, with weaker agents too  -  tldr

Ryo 18 Apr 2024 15:03 UTC
12 points
16 comments4 min readLW link
(medium.com)

How to co­or­di­nate de­spite our bi­ases? - tldr

Ryo 18 Apr 2024 15:03 UTC
3 points
2 comments3 min readLW link
(medium.com)

UDT1.01: Log­i­cal In­duc­tors and Im­plicit Beliefs (5/​10)

Diffractor18 Apr 2024 8:39 UTC
28 points
1 comment19 min readLW link