RSS

beren

Karma: 2,993

Interested in many things. I have a personal blog at https://​​www.beren.io/​​

Ad­den­dum: ba­sic facts about lan­guage mod­els dur­ing training

berenMar 6, 2023, 7:24 PM
22 points
2 comments5 min readLW link

Ba­sic facts about lan­guage mod­els dur­ing training

berenFeb 21, 2023, 11:46 AM
98 points
15 comments18 min readLW link

Val­ida­tor mod­els: A sim­ple ap­proach to de­tect­ing goodharting

berenFeb 20, 2023, 9:32 PM
14 points
1 comment4 min readLW link

Em­pa­thy as a nat­u­ral con­se­quence of learnt re­ward models

berenFeb 4, 2023, 3:35 PM
48 points
27 comments13 min readLW link

AGI will have learnt util­ity functions

berenJan 25, 2023, 7:42 PM
36 points
4 comments13 min readLW link

Gra­di­ent hack­ing is ex­tremely difficult

berenJan 24, 2023, 3:45 PM
164 points
22 comments5 min readLW link

Scal­ing laws vs in­di­vi­d­ual differences

berenJan 10, 2023, 1:22 PM
45 points
21 comments7 min readLW link

Ba­sic Facts about Lan­guage Model Internals

Jan 4, 2023, 1:01 PM
130 points
19 comments9 min readLW link

An ML in­ter­pre­ta­tion of Shard Theory

berenJan 3, 2023, 8:30 PM
39 points
5 comments4 min readLW link

The ul­ti­mate limits of al­ign­ment will de­ter­mine the shape of the long term future

berenJan 2, 2023, 12:47 PM
34 points
2 comments6 min readLW link

Ev­i­dence on re­cur­sive self-im­prove­ment from cur­rent ML

berenDec 30, 2022, 8:53 PM
31 points
12 comments6 min readLW link

Hu­man sex­u­al­ity as an in­ter­est­ing case study of alignment

berenDec 30, 2022, 1:37 PM
39 points
26 comments3 min readLW link

[In­terim re­search re­port] Tak­ing fea­tures out of su­per­po­si­tion with sparse autoencoders

Dec 13, 2022, 3:41 PM
150 points
23 comments22 min readLW link2 reviews

De­con­fus­ing Direct vs Amor­tised Optimization

berenDec 2, 2022, 11:30 AM
134 points
19 comments10 min readLW link

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

Nov 28, 2022, 12:54 PM
199 points
33 comments31 min readLW link

Cur­rent themes in mechanis­tic in­ter­pretabil­ity research

Nov 16, 2022, 2:14 PM
89 points
2 comments12 min readLW link

In­ter­pret­ing Neu­ral Net­works through the Poly­tope Lens

Sep 23, 2022, 5:58 PM
144 points
29 comments33 min readLW link