RSS

Re­fusal in LLMs is me­di­ated by a sin­gle direction

27 Apr 2024 11:13 UTC
112 points
24 comments9 min readLW link

Con­structabil­ity: Plainly-coded AGIs may be fea­si­ble in the near future

27 Apr 2024 16:04 UTC
52 points
7 comments13 min readLW link

On Not Pul­ling The Lad­der Up Be­hind You

Screwtape26 Apr 2024 21:58 UTC
95 points
5 comments9 min readLW link

So What’s Up With PUFAs Chem­i­cally?

J Bostock27 Apr 2024 13:32 UTC
35 points
16 comments6 min readLW link

Duct Tape security

Isaac King26 Apr 2024 18:57 UTC
66 points
8 comments5 min readLW link

Su­per­po­si­tion is not “just” neu­ron polysemanticity

LawrenceC26 Apr 2024 23:22 UTC
46 points
3 comments13 min readLW link

[Question] Ex­am­ples of Highly Coun­ter­fac­tual Dis­cov­er­ies?

johnswentworth23 Apr 2024 22:19 UTC
156 points
83 comments1 min readLW link

The first fu­ture and the best future

KatjaGrace25 Apr 2024 6:40 UTC
97 points
9 comments1 min readLW link
(worldspiritsockpuppet.com)

Spa­tial at­ten­tion as a “tell” for em­pa­thetic simu­la­tion?

Steven Byrnes26 Apr 2024 15:10 UTC
49 points
9 comments8 min readLW link

D&D.Sci Long War: Defen­der of Data-mocracy

aphyer26 Apr 2024 22:30 UTC
38 points
8 comments3 min readLW link

Thoughts on seed oil

dynomight20 Apr 2024 12:29 UTC
260 points
80 comments17 min readLW link
(dynomight.net)

We are headed into an ex­treme com­pute overhang

devrandom26 Apr 2024 21:38 UTC
33 points
12 comments2 min readLW link

Sim­ple probes can catch sleeper agents

23 Apr 2024 21:10 UTC
118 points
14 comments1 min readLW link
(www.anthropic.com)

Im­prov­ing Dic­tionary Learn­ing with Gated Sparse Autoencoders

25 Apr 2024 18:43 UTC
60 points
23 comments1 min readLW link
(arxiv.org)

Scal­ing of AI train­ing runs will slow down af­ter GPT-5

Maxime Riché26 Apr 2024 16:05 UTC
34 points
5 comments3 min readLW link

Two Ver­nor Vinge Book Reviews

Maxwell Tabarrok27 Apr 2024 12:14 UTC
13 points
0 comments2 min readLW link
(www.maximum-progress.com)

Link: Let’s Think Dot by Dot: Hid­den Com­pu­ta­tion in Trans­former Lan­guage Models by Ja­cob Pfau, William Mer­rill & Sa­muel R. Bowman

Chris_Leong27 Apr 2024 13:22 UTC
11 points
0 comments1 min readLW link
(twitter.com)

An In­tro­duc­tion to AI Sandbagging

26 Apr 2024 13:40 UTC
34 points
0 comments8 min readLW link

Mercy to the Ma­chine: Thoughts & Rights

False Name27 Apr 2024 16:36 UTC
8 points
5 comments17 min readLW link

Trans­form­ers Rep­re­sent Belief State Geom­e­try in their Resi­d­ual Stream

Adam Shai16 Apr 2024 21:16 UTC
310 points
64 comments12 min readLW link