RSS

jacek

Karma: 204

(My) self-refer­en­tial rea­son to be­lieve in free will

jacekJan 6, 2025, 11:35 PM
12 points
6 comments1 min readLW link

Char­ac­ter­iz­ing sta­ble re­gions in the resi­d­ual stream of LLMs

Sep 26, 2024, 1:44 PM
42 points
4 comments1 min readLW link
(arxiv.org)

Good­hart’s Law in Re­in­force­ment Learning

Oct 16, 2023, 12:54 AM
126 points
22 comments7 min readLW link

A warm-up for the AI gov­er­nance project

jacekFeb 17, 2023, 6:06 PM
10 points
2 comments3 min readLW link

Cat­e­gor­i­cal-mea­sure-the­o­retic ap­proach to op­ti­mal poli­cies tend­ing to seek power

jacekJan 12, 2023, 12:32 AM
31 points
3 comments6 min readLW link