AXRP Epi­sode 35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

DanielFilan24 Aug 2024 22:30 UTC
21 points
0 comments74 min readLW link

The top 30 books to ex­pand the ca­pa­bil­ities of AI: a bi­ased read­ing list

Jonathan Mugan24 Aug 2024 21:48 UTC
−6 points
0 comments16 min readLW link

The Ap Distribution

criticalpoints24 Aug 2024 21:45 UTC
22 points
7 comments3 min readLW link
(eregis.github.io)

What is it to solve the al­ign­ment prob­lem?

Joe Carlsmith24 Aug 2024 21:19 UTC
69 points
17 comments53 min readLW link

Ex­am­ine self mod­ifi­ca­tion as an in­tu­ition provider for the con­cept of con­scious­ness

weightt an24 Aug 2024 20:48 UTC
−5 points
2 comments10 min readLW link

[Question] Look­ing to in­ter­view AI Safety re­searchers for a book

jeffreycaruso24 Aug 2024 19:57 UTC
14 points
0 comments1 min readLW link

Per­plex­ity wins my AI race

Elizabeth24 Aug 2024 19:20 UTC
107 points
12 comments10 min readLW link
(acesounderglass.com)

Why should any­one boot *you* up?

onur24 Aug 2024 17:51 UTC
−1 points
5 comments3 min readLW link
(solmaz.io)

Un­der­stand­ing Hid­den Com­pu­ta­tions in Chain-of-Thought Reasoning

rokosbasilisk24 Aug 2024 16:35 UTC
6 points
1 comment1 min readLW link

Au­gust 2024 Time Tracking

jefftk24 Aug 2024 13:50 UTC
22 points
0 comments3 min readLW link
(www.jefftk.com)

Train­ing a Sparse Au­toen­coder in < 30 min­utes on 16GB of VRAM us­ing an S3 cache

Louka Ewington-Pitsos24 Aug 2024 7:39 UTC
17 points
0 comments5 min readLW link

[Question] Look­ing for in­tu­itions to ex­tend bar­gain­ing notions

ProgramCrafter24 Aug 2024 5:00 UTC
13 points
0 comments1 min readLW link

Owain Evans on Si­tu­a­tional Aware­ness and Out-of-Con­text Rea­son­ing in LLMs

Michaël Trazzi24 Aug 2024 4:30 UTC
55 points
0 comments5 min readLW link

[Question] Devel­op­ing Pos­i­tive Habits through Video Games

pzas24 Aug 2024 3:47 UTC
1 point
5 comments1 min readLW link

“Can AI Scal­ing Con­tinue Through 2030?”, Epoch AI (yes)

gwern24 Aug 2024 1:40 UTC
129 points
4 comments3 min readLW link
(epochai.org)

What’s im­por­tant in “AI for epistemics”?

Lukas Finnveden24 Aug 2024 1:27 UTC
41 points
0 comments28 min readLW link
(lukasfinnveden.substack.com)

Show­ing SAE La­tents Are Not Atomic Us­ing Meta-SAEs

24 Aug 2024 0:56 UTC
61 points
9 comments20 min readLW link

Us­ing ide­olog­i­cally-charged lan­guage to get gpt-3.5-turbo to di­s­obey it’s sys­tem prompt: a demo

Milan W24 Aug 2024 0:13 UTC
3 points
0 comments6 min readLW link

Craft­ing Poly­se­man­tic Trans­former Bench­marks with Known Circuits

23 Aug 2024 22:03 UTC
10 points
0 comments25 min readLW link

[Question] What is an ap­pro­pri­ate sam­ple size when sur­vey­ing billions of data points?

Blake23 Aug 2024 21:54 UTC
1 point
2 comments1 min readLW link

In­ter­pretabil­ity as Com­pres­sion: Re­con­sid­er­ing SAE Ex­pla­na­tions of Neu­ral Ac­ti­va­tions with MDL-SAEs

23 Aug 2024 18:52 UTC
40 points
5 comments16 min readLW link

How I started be­liev­ing re­li­gion might ac­tu­ally mat­ter for ra­tio­nal­ity and moral philosophy

zhukeepa23 Aug 2024 17:40 UTC
128 points
41 comments7 min readLW link

[Question] What do you ex­pect AI ca­pa­bil­ities may look like in 2028?

nonzerosum23 Aug 2024 16:59 UTC
9 points
5 comments1 min readLW link

In­vi­ta­tion to lead a pro­ject at AI Safety Camp (Vir­tual Edi­tion, 2025)

23 Aug 2024 14:18 UTC
17 points
2 comments4 min readLW link

If we solve al­ign­ment, do we die any­way?

Seth Herd23 Aug 2024 13:13 UTC
77 points
112 comments4 min readLW link

What’s go­ing on with Per-Com­po­nent Weight Up­dates?

4gate22 Aug 2024 21:22 UTC
1 point
0 comments6 min readLW link

In­ter­op­er­a­ble High Level Struc­tures: Early Thoughts on Adjectives

22 Aug 2024 21:12 UTC
49 points
1 comment7 min readLW link

In­ter­est poll: A time-waster blocker for desk­top Linux programs

nahoj22 Aug 2024 20:44 UTC
4 points
5 comments1 min readLW link

Turn­ing 22 in the Pre-Apocalypse

testingthewaters22 Aug 2024 20:28 UTC
37 points
14 comments24 min readLW link
(utilityhotbar.github.io)

A Ro­bust Nat­u­ral La­tent Over A Mixed Distri­bu­tion Is Nat­u­ral Over The Distri­bu­tions Which Were Mixed

22 Aug 2024 19:19 UTC
42 points
4 comments4 min readLW link

what be­com­ing more se­cure did for me

Chipmonk22 Aug 2024 17:44 UTC
26 points
5 comments2 min readLW link
(chrislakin.blog)

A primer on the cur­rent state of longevity research

Abhishaike Mahajan22 Aug 2024 17:14 UTC
109 points
6 comments14 min readLW link
(www.owlposting.com)

Some rea­sons to start a pro­ject to stop harm­ful AI

Remmelt22 Aug 2024 16:23 UTC
5 points
0 comments2 min readLW link

The eco­nomics of space tethers

harsimony22 Aug 2024 16:15 UTC
67 points
22 comments7 min readLW link
(splittinginfinity.substack.com)

Dima’s Shortform

Dmitrii Krasheninnikov22 Aug 2024 14:49 UTC
1 point
0 comments1 min readLW link

AI #78: Some Wel­come Calm

Zvi22 Aug 2024 14:20 UTC
61 points
15 comments33 min readLW link
(thezvi.wordpress.com)

[Question] How do we know dreams aren’t real?

Logan Zoellner22 Aug 2024 12:41 UTC
5 points
31 comments1 min readLW link

Mea­sur­ing Struc­ture Devel­op­ment in Al­gorith­mic Transformers

22 Aug 2024 8:38 UTC
56 points
4 comments11 min readLW link

De­cep­tion and Jailbreak Se­quence: 1. Iter­a­tive Refine­ment Stages of De­cep­tion in LLMs

22 Aug 2024 7:32 UTC
23 points
1 comment21 min readLW link

Just be­cause an LLM said it doesn’t mean it’s true: an illus­tra­tive example

dirk21 Aug 2024 21:05 UTC
26 points
12 comments3 min readLW link

[Question] How do you finish your tasks faster?

Cipolla21 Aug 2024 20:01 UTC
4 points
2 comments1 min readLW link

AI Safety Newslet­ter #40: Cal­ifor­nia AI Leg­is­la­tion Plus, NVIDIA De­lays Chip Pro­duc­tion, and Do AI Safety Bench­marks Ac­tu­ally Mea­sure Safety?

21 Aug 2024 18:09 UTC
11 points
0 comments6 min readLW link
(newsletter.safe.ai)

[Question] Should LW sug­gest stan­dard metaprompts?

Dagon21 Aug 2024 16:41 UTC
3 points
6 comments1 min readLW link

Eter­nal Ex­is­tence and Eter­nal Bore­dom: The Case for AI and Im­mor­tal Humans

Tuan Tu Nguyen21 Aug 2024 9:58 UTC
−12 points
2 comments5 min readLW link

Please do not use AI to write for you

Richard_Kennaway21 Aug 2024 9:53 UTC
65 points
34 comments4 min readLW link

Ap­ply to Aether—In­de­pen­dent LLM Agent Safety Re­search Group

RohanS21 Aug 2024 9:47 UTC
9 points
0 comments7 min readLW link
(forum.effectivealtruism.org)

the Giga Press was a mistake

bhauth21 Aug 2024 4:51 UTC
95 points
26 comments5 min readLW link
(bhauth.com)

Ex­plor­ing the Boundaries of Cog­ni­to­haz­ards and the Na­ture of Reality

Victor Novikov21 Aug 2024 3:42 UTC
−2 points
2 comments1 min readLW link

[Question] What is the point of 2v2 de­bates?

Axel Ahlqvist20 Aug 2024 21:59 UTC
2 points
1 comment1 min readLW link

[Question] Where should I look for in­for­ma­tion on gut health?

FinalFormal220 Aug 2024 19:44 UTC
10 points
10 comments1 min readLW link