RSS

Sam Marks

Karma: 1,910

Eval­u­at­ing Sparse Au­toen­coders with Board Game Models

2 Aug 2024 19:50 UTC
38 points
1 comment9 min readLW link

Discrim­i­nat­ing Be­hav­iorally Iden­ti­cal Clas­sifiers: a model prob­lem for ap­ply­ing in­ter­pretabil­ity to scal­able oversight

Sam Marks18 Apr 2024 16:17 UTC
107 points
10 comments12 min readLW link

What’s up with LLMs rep­re­sent­ing XORs of ar­bi­trary fea­tures?

Sam Marks3 Jan 2024 19:44 UTC
157 points
61 comments16 min readLW link

Some open-source dic­tio­nar­ies and dic­tio­nary learn­ing infrastructure

Sam Marks5 Dec 2023 6:05 UTC
45 points
7 comments5 min readLW link

Thoughts on open source AI

Sam Marks3 Nov 2023 15:35 UTC
62 points
17 comments10 min readLW link

Turn­ing off lights with model editing

Sam Marks12 May 2023 20:25 UTC
67 points
5 comments2 min readLW link
(arxiv.org)

[Cross­post] ACX 2022 Pre­dic­tion Con­test Results

24 Jan 2023 6:56 UTC
46 points
6 comments8 min readLW link

AGISF adap­ta­tion for in-per­son groups

13 Jan 2023 3:24 UTC
44 points
2 comments3 min readLW link

Up­date on Har­vard AI Safety Team and MIT AI Alignment

2 Dec 2022 0:56 UTC
60 points
4 comments8 min readLW link

Recom­mend HAIST re­sources for as­sess­ing the value of RLHF-re­lated al­ign­ment research

5 Nov 2022 20:58 UTC
26 points
9 comments3 min readLW link

Cau­tion when in­ter­pret­ing Deep­mind’s In-con­text RL paper

Sam Marks1 Nov 2022 2:42 UTC
105 points
8 comments4 min readLW link

Safety con­sid­er­a­tions for on­line gen­er­a­tive modeling

Sam Marks7 Jul 2022 18:31 UTC
42 points
9 comments14 min readLW link

Proxy mis­speci­fi­ca­tion and the ca­pa­bil­ities vs. value learn­ing race

Sam Marks16 May 2022 18:58 UTC
23 points
3 comments4 min readLW link

If you’re very op­ti­mistic about ELK then you should be op­ti­mistic about outer alignment

Sam Marks27 Apr 2022 19:30 UTC
17 points
8 comments3 min readLW link

Sam Marks’s Shortform

Sam Marks13 Apr 2022 21:38 UTC
3 points
26 comments1 min readLW link

2022 ACX pre­dic­tions: mar­ket prices

Sam Marks6 Mar 2022 6:24 UTC
21 points
2 comments5 min readLW link

Movie re­view: Don’t Look Up

Sam Marks4 Jan 2022 20:16 UTC
35 points
6 comments11 min readLW link

[Book re­view] Gödel, Escher, Bach: an in-depth explainer

Sam Marks29 Sep 2021 19:03 UTC
98 points
23 comments23 min readLW link1 review

[Question] For mRNA vac­cines, is (short-term) effi­cacy re­ally higher af­ter the sec­ond dose?

Sam Marks25 Apr 2021 20:21 UTC
27 points
13 comments3 min readLW link