Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
103 points
28 comments5 min readLW link

Robin Han­son & Liron Shapira De­bate AI X-Risk

Liron8 Jul 2024 21:45 UTC
34 points
4 comments1 min readLW link
(www.youtube.com)

“The Sin­gu­lar­ity Is Nearer” by Ray Kurzweil—Review

Lavender8 Jul 2024 21:32 UTC
22 points
0 comments4 min readLW link

Sam­ple Prevalence vs Global Prevalence

jefftk8 Jul 2024 21:00 UTC
11 points
0 comments2 min readLW link
(www.jefftk.com)

Ad­vice to ju­nior AI gov­er­nance researchers

Akash8 Jul 2024 19:19 UTC
65 points
1 comment5 min readLW link

Pan­theon Interface

8 Jul 2024 19:03 UTC
124 points
22 comments6 min readLW link

Launch­ing the AI Fore­cast­ing Bench­mark Series Q3 | $30k in Prizes

ChristianWilliams8 Jul 2024 17:20 UTC
5 points
0 comments1 min readLW link
(www.metaculus.com)

The Golden Mean of Scien­tific Virtues

adamShimi8 Jul 2024 17:16 UTC
12 points
4 comments8 min readLW link
(epistemologicalfascinations.substack.com)

Mas­s­ape­qua (Long Is­land), New York, USA – ACX Meetup

Gabriel Weil8 Jul 2024 17:01 UTC
2 points
0 comments1 min readLW link

Dialogue in­tro­duc­tion to Sin­gu­lar Learn­ing Theory

Olli Järviniemi8 Jul 2024 16:58 UTC
97 points
14 comments8 min readLW link

An­nounc­ing The Techno-Hu­man­ist Man­i­festo: A new philos­o­phy of progress for the 21st century

jasoncrawford8 Jul 2024 16:33 UTC
16 points
4 comments5 min readLW link
(blog.rootsofprogress.org)

Re­sponse to Dileep Ge­orge: AGI safety war­rants plan­ning ahead

Steven Byrnes8 Jul 2024 15:27 UTC
27 points
7 comments27 min readLW link

Why not par­li­a­men­tar­i­anism? [book by Ti­ago Ribeiro dos San­tos]

Arturo Macias8 Jul 2024 14:57 UTC
2 points
1 comment4 min readLW link

Games of My Child­hood: The Troops

Kaj_Sotala8 Jul 2024 11:20 UTC
18 points
0 comments5 min readLW link
(kajsotala.fi)

Towards shut­down­able agents via stochas­tic choice

8 Jul 2024 10:14 UTC
59 points
11 comments23 min readLW link
(arxiv.org)

On scal­able over­sight with weak LLMs judg­ing strong LLMs

8 Jul 2024 8:59 UTC
49 points
18 comments7 min readLW link
(arxiv.org)

Poker is a bad game for teach­ing epistemics. Fig­gie is a bet­ter one.

rossry8 Jul 2024 6:05 UTC
102 points
47 comments11 min readLW link
(blog.rossry.net)

Con­trol­led Creative Destruction

Martin Sustrik8 Jul 2024 4:36 UTC
11 points
0 comments2 min readLW link

On say­ing “Thank you” in­stead of “I’m Sorry”

Michael Cohn8 Jul 2024 3:13 UTC
130 points
16 comments3 min readLW link

How can I get over my fear of be­com­ing an em­u­lated con­scious­ness?

James Dowdell7 Jul 2024 22:02 UTC
6 points
8 comments5 min readLW link

An Ex­tremely Opinionated An­no­tated List of My Favourite Mechanis­tic In­ter­pretabil­ity Papers v2

Neel Nanda7 Jul 2024 17:39 UTC
134 points
15 comments25 min readLW link

Joint manda­tory dona­tion as a way to in­crease the num­ber of donations

Crazy philosopher7 Jul 2024 10:56 UTC
3 points
3 comments2 min readLW link

Ra­tion­al­ity vs Alignment

Donatas Lučiūnas7 Jul 2024 10:12 UTC
−14 points
14 comments2 min readLW link

Beyond Bio­mark­ers: Un­der­stand­ing Mul­tis­cale Causality

Matěj Nekoranec7 Jul 2024 9:56 UTC
13 points
0 comments7 min readLW link

Good­hart’s Law and Emotions

Zero Contradictions7 Jul 2024 8:32 UTC
1 point
5 comments1 min readLW link
(expandingrationality.substack.com)

Reflec­tions on Less Online

Error7 Jul 2024 3:49 UTC
85 points
15 comments18 min readLW link

LK-99 in retrospect

bhauth7 Jul 2024 2:06 UTC
72 points
21 comments3 min readLW link
(www.bhauth.com)

NYU De­bate Train­ing Up­date: Meth­ods, Baselines, Pre­limi­nary Results

samarnesen6 Jul 2024 18:28 UTC
9 points
0 comments20 min readLW link

Scal­able over­sight as a quan­ti­ta­tive rather than qual­i­ta­tive problem

Buck6 Jul 2024 17:42 UTC
85 points
11 comments3 min readLW link

An AI Man­hat­tan Pro­ject is Not Inevitable

Maxwell Tabarrok6 Jul 2024 16:42 UTC
38 points
25 comments4 min readLW link
(www.maximum-progress.com)

[Linkpost] A Case for AI Consciousness

6 Jul 2024 14:52 UTC
19 points
2 comments1 min readLW link
(philpapers.org)

[Question] Can agents co­or­di­nate on ran­dom­ness with­out out­side sources?

Mikhail Samin6 Jul 2024 13:43 UTC
6 points
16 comments1 min readLW link

AI Align­ment Re­search Eng­ineer Ac­cel­er­a­tor (ARENA): Call for ap­pli­cants v4.0

6 Jul 2024 11:34 UTC
57 points
7 comments6 min readLW link

Links and brief mus­ings for June

Kaj_Sotala6 Jul 2024 10:10 UTC
26 points
0 comments10 min readLW link
(kajsotala.fi)

In­de­ci­sion and in­ter­nal­ized au­thor­ity figures

Kaj_Sotala6 Jul 2024 10:10 UTC
67 points
1 comment2 min readLW link
(kajsotala.fi)

Free Will, Deter­minism, And Choice

Zero Contradictions6 Jul 2024 6:34 UTC
7 points
3 comments1 min readLW link
(thewaywardaxolotl.blogspot.com)

Travel Buffer

jefftk6 Jul 2024 2:20 UTC
17 points
3 comments1 min readLW link
(www.jefftk.com)

[Question] What progress have we made on au­to­mated au­dit­ing?

LawrenceC6 Jul 2024 1:49 UTC
38 points
1 comment1 min readLW link

A “Bit­ter Les­son” Ap­proach to Align­ing AGI and ASI

RogerDearnaley6 Jul 2024 1:23 UTC
56 points
39 comments24 min readLW link

D&D.Sci: Whom Shall You Call?

abstractapplic5 Jul 2024 20:53 UTC
38 points
6 comments2 min readLW link

[In­terim re­search re­port] Ac­ti­va­tion plateaus & sen­si­tive di­rec­tions in GPT2

5 Jul 2024 17:05 UTC
64 points
2 comments5 min readLW link

Min­i­mal­ist And Max­i­mal­ist Type Systems

adamShimi5 Jul 2024 16:25 UTC
17 points
6 comments3 min readLW link
(epistemologicalfascinations.substack.com)

ML4Good Sum­mer Boot­camps—Ap­pli­ca­tions Open [dead­line ex­tended]

YM5 Jul 2024 13:59 UTC
12 points
0 comments1 min readLW link

[Question] Are there any plans to launch a pa­per­back ver­sion of “Ra­tion­al­ity: From AI to Zom­bies”?

m_arj5 Jul 2024 11:14 UTC
2 points
1 comment1 min readLW link

Dooms­day Ar­gu­ment and the False Dilemma of An­thropic Reasoning

Ape in the coat5 Jul 2024 5:38 UTC
35 points
55 comments7 min readLW link

Find­ing the Wis­dom to Build Safe AI

Gordon Seidoh Worley4 Jul 2024 19:04 UTC
36 points
10 comments9 min readLW link

Libs vs Frame­works, Mid­dle-Level Reg­u­lar­i­ties vs Theories

adamShimi4 Jul 2024 19:01 UTC
23 points
0 comments2 min readLW link
(epistemologicalfascinations.substack.com)

The Po­ten­tial Im­pos­si­bil­ity of Sub­jec­tive Death

VictorLJZ4 Jul 2024 18:17 UTC
3 points
34 comments1 min readLW link

Con­sider the hum­ble rock (or: why the dumb thing kills you)

pleiotroth4 Jul 2024 13:54 UTC
58 points
11 comments4 min readLW link

AI #71: Farewell to Chevron

Zvi4 Jul 2024 13:40 UTC
53 points
9 comments36 min readLW link
(thezvi.wordpress.com)