Ex­per­i­men­ta­tion (Part 7 of “The Sense Of Phys­i­cal Ne­ces­sity”)

LoganStrohl18 Mar 2024 21:25 UTC
33 points
0 comments10 min readLW link

INTERVIEW: Round 2 - StakeOut.AI w/​ Dr. Peter Park

jacobhaimes18 Mar 2024 21:21 UTC
5 points
0 comments1 min readLW link
(into-ai-safety.github.io)

Neu­ro­science and Alignment

Garrett Baker18 Mar 2024 21:09 UTC
40 points
25 comments2 min readLW link

GPT, the mag­i­cal col­lab­o­ra­tion zone, Lex Frid­man and Sam Altman

Bill Benzon18 Mar 2024 20:04 UTC
3 points
1 comment3 min readLW link

Mea­sur­ing Co­her­ence of Poli­cies in Toy Environments

18 Mar 2024 17:59 UTC
59 points
9 comments14 min readLW link

AtP*: An effi­cient and scal­able method for lo­cal­iz­ing LLM be­havi­our to components

18 Mar 2024 17:28 UTC
19 points
0 comments1 min readLW link
(arxiv.org)

Com­mu­nity Notes by X

NicholasKees18 Mar 2024 17:13 UTC
124 points
15 comments7 min readLW link

[Question] Is the Basilisk pre­tend­ing to be hid­den in this simu­la­tion so that it can check what I would do if con­di­tioned by a world with­out the Basilisk?

maybefbi18 Mar 2024 16:05 UTC
−18 points
1 comment1 min readLW link

On Devin

Zvi18 Mar 2024 13:20 UTC
148 points
34 comments11 min readLW link
(thezvi.wordpress.com)

RLLMv10 experiment

MiguelDev18 Mar 2024 8:32 UTC
5 points
0 comments2 min readLW link

Join the AI Eval­u­a­tion Tasks Bounty Hackathon

Esben Kran18 Mar 2024 8:15 UTC
12 points
1 comment1 min readLW link

5 Physics Problems

18 Mar 2024 8:05 UTC
60 points
0 comments15 min readLW link

In­fer­ring the model di­men­sion of API-pro­tected LLMs

Ege Erdil18 Mar 2024 6:19 UTC
34 points
3 comments4 min readLW link
(arxiv.org)

AI strat­egy given the need for good reflection

owencb18 Mar 2024 0:48 UTC
7 points
0 comments1 min readLW link

XAI re­leases Grok base model

Jacob G-W18 Mar 2024 0:47 UTC
11 points
3 comments1 min readLW link
(x.ai)

Toki pona FAQ

dkl917 Mar 2024 21:44 UTC
36 points
8 comments1 min readLW link
(dkl9.net)

EA ErFiN Pro­ject work

Max_He-Ho17 Mar 2024 20:42 UTC
2 points
0 comments1 min readLW link

EA ErFiN Pro­ject work

Max_He-Ho17 Mar 2024 20:37 UTC
2 points
0 comments1 min readLW link

[Question] Alice and Bob is de­bat­ing on a tech­nique. Alice says Bob should try it be­fore deny­ing it. Is it a fal­lacy or some­thing similar?

Ooker17 Mar 2024 20:01 UTC
0 points
19 comments2 min readLW link

Is there a way to calcu­late the P(we are in a 2nd cold war)?

cloak17 Mar 2024 20:01 UTC
−9 points
2 comments1 min readLW link

The Worst Form Of Govern­ment (Ex­cept For Every­thing Else We’ve Tried)

johnswentworth17 Mar 2024 18:11 UTC
134 points
47 comments4 min readLW link

Ap­ply­ing simu­lacrum lev­els to hob­bies, in­ter­ests and goals

DMMF17 Mar 2024 16:18 UTC
15 points
2 comments4 min readLW link
(danfrank.ca)

What is the best ar­gu­ment that LLMs are shog­goths?

JoshuaFox17 Mar 2024 11:36 UTC
26 points
22 comments1 min readLW link

In­vi­ta­tion to the Prince­ton AI Align­ment and Safety Seminar

Sadhika Malladi17 Mar 2024 1:10 UTC
6 points
1 comment1 min readLW link

Anx­iety vs. Depression

Sable17 Mar 2024 0:15 UTC
85 points
35 comments3 min readLW link
(affablyevil.substack.com)

Celiefs

TheLemmaLlama16 Mar 2024 23:56 UTC
3 points
8 comments1 min readLW link

My PhD the­sis: Al­gorith­mic Bayesian Epistemology

Eric Neyman16 Mar 2024 22:56 UTC
259 points
14 comments7 min readLW link
(arxiv.org)

How peo­ple stopped dy­ing from di­ar­rhea so much (& other life-sav­ing de­ci­sions)

Writer16 Mar 2024 16:00 UTC
45 points
0 comments1 min readLW link
(youtu.be)

Trans­for­ma­tive trust­build­ing via ad­vance­ments in de­cen­tral­ized lie detection

trevor16 Mar 2024 5:56 UTC
17 points
7 comments38 min readLW link
(www.ncbi.nlm.nih.gov)

En­ter the Wor­ld­sEnd

Akram Choudhary16 Mar 2024 1:34 UTC
−25 points
8 comments1 min readLW link

Strong-Misal­ign­ment: Does Yud­kowsky (or Chris­ti­ano, or TurnTrout, or Wolfram, or…etc.) Have an Ele­va­tor Speech I’m Miss­ing?

Benjamin Bourlier15 Mar 2024 23:17 UTC
−4 points
3 comments16 min readLW link

In­tro­duc­ing METR’s Au­ton­omy Eval­u­a­tion Resources

15 Mar 2024 23:16 UTC
90 points
0 comments1 min readLW link
(metr.github.io)

Are AIs con­scious? It might depend

Logan Zoellner15 Mar 2024 23:09 UTC
6 points
6 comments3 min readLW link

Beyond Max­ipok — good re­flec­tive gov­er­nance as a tar­get for action

owencb15 Mar 2024 22:22 UTC
20 points
0 comments1 min readLW link

Mid­dle Child Phenomenon

PhilosophicalSoul15 Mar 2024 20:47 UTC
3 points
3 comments2 min readLW link

Ca­pa­bil­ity or Align­ment? Re­spect the LLM Base Model’s Ca­pa­bil­ity Dur­ing Alignment

Jingfeng Yang15 Mar 2024 17:56 UTC
7 points
0 comments24 min readLW link

Ra­tional An­i­ma­tions offers an­i­ma­tion pro­duc­tion and writ­ing ser­vices!

Writer15 Mar 2024 17:26 UTC
33 points
0 comments1 min readLW link

Im­prov­ing SAE’s by Sqrt()-ing L1 & Re­mov­ing Low­est Ac­ti­vat­ing Fea­tures

15 Mar 2024 16:30 UTC
26 points
5 comments4 min readLW link

Stuttgart, Ger­many—ACX Spring Mee­tups Every­where 2024

Benjamin R15 Mar 2024 14:59 UTC
2 points
1 comment1 min readLW link

Con­trol­ling AGI Risk

TeaSea15 Mar 2024 4:56 UTC
6 points
8 comments4 min readLW link

Ulm, Ger­many—ACX Spring Mee­tups Every­where 2024

Benjamin R15 Mar 2024 1:32 UTC
2 points
1 comment1 min readLW link

New­port News/​ Virginia ACX Meetup

Daniel14 Mar 2024 23:46 UTC
1 point
0 comments1 min readLW link

Con­struc­tive Cauchy se­quences vs. Dedekind cuts

jessicata14 Mar 2024 23:04 UTC
47 points
23 comments4 min readLW link
(unstableontology.com)

A Nail in the Coffin of Exceptionalism

Yeshua God14 Mar 2024 22:41 UTC
−17 points
0 comments3 min readLW link

Toward a Broader Con­cep­tion of Ad­verse Selection

Ricki Heicklen14 Mar 2024 22:40 UTC
177 points
61 comments13 min readLW link
(bayesshammai.substack.com)

More peo­ple get­ting into AI safety should do a PhD

AdamGleave14 Mar 2024 22:14 UTC
60 points
24 comments12 min readLW link
(gleave.me)

Col­lec­tion (Part 6 of “The Sense Of Phys­i­cal Ne­ces­sity”)

LoganStrohl14 Mar 2024 21:37 UTC
28 points
0 comments8 min readLW link

Fixed point or os­cillate or noise

lemonhope14 Mar 2024 18:37 UTC
3 points
10 comments1 min readLW link

How use­ful is “AI Con­trol” as a fram­ing on AI X-Risk?

14 Mar 2024 18:06 UTC
70 points
4 comments34 min readLW link

Sparse au­toen­coders find com­posed fea­tures in small toy mod­els

14 Mar 2024 18:00 UTC
33 points
12 comments15 min readLW link