RSS

beren

Karma: 2,992

Interested in many things. I have a personal blog at https://​​www.beren.io/​​

Main­tain­ing Align­ment dur­ing RSI as a Feed­back Con­trol Problem

berenMar 2, 2025, 12:21 AM
66 points
6 comments11 min readLW link

Cap­i­tal Own­er­ship Will Not Prevent Hu­man Disempowerment

berenJan 5, 2025, 6:00 AM
146 points
18 comments14 min readLW link

[Question] When and why did ‘train­ing’ be­come ‘pre­train­ing’?

berenMar 8, 2024, 2:29 PM
16 points
6 comments1 min readLW link

The­o­ries of Change for AI Auditing

Nov 13, 2023, 7:33 PM
54 points
0 comments18 min readLW link
(www.apolloresearch.ai)

[Linkpost] Bi­den-Har­ris Ex­ec­u­tive Order on AI

berenOct 30, 2023, 3:20 PM
3 points
0 comments1 min readLW link

Prefer­ence Ag­gre­ga­tion as Bayesian Inference

berenJul 27, 2023, 5:59 PM
14 points
1 comment1 min readLW link

Thoughts on Loss Land­scapes and why Deep Learn­ing works

berenJul 25, 2023, 4:41 PM
53 points
4 comments18 min readLW link

BCIs and the ecosys­tem of mod­u­lar minds

berenJul 21, 2023, 3:58 PM
88 points
14 comments11 min readLW link

He­donic Loops and Tam­ing RL

berenJul 19, 2023, 3:12 PM
20 points
14 comments9 min readLW link

[Linkpost] In­tro­duc­ing Superalignment

berenJul 5, 2023, 6:23 PM
175 points
69 comments1 min readLW link
(openai.com)

The case for re­mov­ing al­ign­ment and ML re­search from the train­ing dataset

berenMay 30, 2023, 8:54 PM
48 points
8 comments5 min readLW link

An­nounc­ing Apollo Research

May 30, 2023, 4:17 PM
217 points
11 comments8 min readLW link

A small up­date to the Sparse Cod­ing in­terim re­search report

Apr 30, 2023, 7:54 PM
61 points
5 comments1 min readLW link

Deep learn­ing mod­els might be se­cretly (al­most) linear

berenApr 24, 2023, 6:43 PM
117 points
29 comments4 min readLW link

Scaf­folded LLMs as nat­u­ral lan­guage computers

berenApr 12, 2023, 10:47 AM
95 points
10 comments11 min readLW link

The sur­pris­ing pa­ram­e­ter effi­ciency of vi­sion models

berenApr 8, 2023, 7:44 PM
81 points
28 comments4 min readLW link

The Com­pu­ta­tional Anatomy of Hu­man Values

berenApr 6, 2023, 10:33 AM
72 points
30 comments30 min readLW link

Orthog­o­nal­ity is expensive

berenApr 3, 2023, 10:20 AM
43 points
9 comments3 min readLW link

RLHF does not ap­pear to differ­en­tially cause mode-collapse

Mar 20, 2023, 3:39 PM
95 points
9 comments3 min readLW link

Against ubiquitous al­ign­ment taxes

berenMar 6, 2023, 7:50 PM
57 points
10 comments2 min readLW link