RSS

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
191 points
82 comments10 min readLW link

Open Thread Spring 2024

habryka11 Mar 2024 19:17 UTC
22 points
144 comments1 min readLW link

Lan­guage Models Model Us

eggsyntax17 May 2024 21:00 UTC
134 points
43 comments7 min readLW link

[Question] Are most peo­ple deeply con­fused about “love”, or am I miss­ing a hu­man uni­ver­sal?

SpectrumDT23 May 2024 13:22 UTC
2 points
3 comments3 min readLW link

The But­ton (Short Comic)

milanrosko22 May 2024 23:28 UTC
2 points
1 comment1 min readLW link

Let’s make the truth eas­ier to find

DPiepgrass20 Mar 2023 4:28 UTC
24 points
44 comments1 min readLW link

Why en­tropy means you might not have to worry as much about su­per­in­tel­li­gent AI

Ron J23 May 2024 3:52 UTC
−13 points
1 comment2 min readLW link

A Bi-Mo­dal Brain Model

Johannes C. Mayer22 May 2024 20:10 UTC
12 points
2 comments2 min readLW link

What will the first hu­man-level AI look like, and how might things go wrong?

EuanMcLean23 May 2024 11:17 UTC
5 points
1 comment15 min readLW link

“Refram­ing Su­per­in­tel­li­gence” + LLMs + 4 years

Eric Drexler10 Jul 2023 13:42 UTC
117 points
9 comments12 min readLW link

How to train your own “Sleeper Agents”

evhub7 Feb 2024 0:31 UTC
91 points
11 comments2 min readLW link

[Question] SAE sparse fea­ture graph us­ing only resi­d­ual layers

crayhippo23 May 2024 13:32 UTC
0 points
0 comments1 min readLW link

Power Law Policy

Ben Turtel23 May 2024 5:28 UTC
9 points
2 comments6 min readLW link
(bturtel.substack.com)

[Question] Which skin­care prod­ucts are ev­i­dence-based?

Vanessa Kosoy2 May 2024 15:22 UTC
108 points
44 comments1 min readLW link

Each Llama3-8b text uses a differ­ent “ran­dom” sub­space of the ac­ti­va­tion space

tailcalled22 May 2024 7:31 UTC
3 points
4 comments7 min readLW link

What’s Go­ing on With OpenAI’s Mes­sag­ing?

ozziegooen21 May 2024 2:22 UTC
158 points
12 comments1 min readLW link

Ex­ec­u­tive Dys­func­tion 101

DaystarEld23 May 2024 12:43 UTC
7 points
0 comments3 min readLW link
(daystareld.com)

AI #65: I Spy With My AI

Zvi23 May 2024 12:40 UTC
12 points
0 comments43 min readLW link
(thezvi.wordpress.com)

The pre­dic­tive power of dis­si­pa­tive adaptation

dr_s17 Dec 2023 14:01 UTC
45 points
13 comments19 min readLW link

What mis­takes has the AI safety move­ment made?

EuanMcLean23 May 2024 11:19 UTC
18 points
0 comments12 min readLW link