RSS

mattmacdermott

Karma: 1,180

Is in­stru­men­tal con­ver­gence a thing for virtue-driven agents?

mattmacdermottApr 2, 2025, 3:59 AM
33 points
37 comments2 min readLW link

Val­i­dat­ing against a mis­al­ign­ment de­tec­tor is very differ­ent to train­ing against one

mattmacdermottMar 4, 2025, 3:41 PM
29 points
4 comments4 min readLW link

Su­per­in­tel­li­gent Agents Pose Catas­trophic Risks: Can Scien­tist AI Offer a Safer Path?

Feb 24, 2025, 6:31 PM
44 points
15 comments11 min readLW link

Con­text-de­pen­dent consequentialism

Nov 4, 2024, 9:29 AM
31 points
6 comments27 min readLW link

Can a Bayesian Or­a­cle Prevent Harm from an Agent? (Ben­gio et al. 2024)

mattmacdermottSep 1, 2024, 7:46 AM
26 points
0 comments5 min readLW link
(yoshuabengio.org)

Ben­gio’s Align­ment Pro­posal: “Towards a Cau­tious Scien­tist AI with Con­ver­gent Safety Bounds”

mattmacdermottFeb 29, 2024, 1:59 PM
76 points
19 comments14 min readLW link
(yoshuabengio.org)

mattmac­der­mott’s Shortform

mattmacdermottJan 3, 2024, 9:08 AM
4 points
32 commentsLW link

What’s next for the field of Agent Foun­da­tions?

Nov 30, 2023, 5:55 PM
59 points
23 comments10 min readLW link

Op­ti­mi­sa­tion Mea­sures: Desider­ata, Im­pos­si­bil­ity, Proposals

Aug 7, 2023, 3:52 PM
36 points
9 comments1 min readLW link

Re­ward Hack­ing from a Causal Perspective

Jul 21, 2023, 6:27 PM
29 points
6 comments7 min readLW link

In­cen­tives from a causal perspective

Jul 10, 2023, 5:16 PM
27 points
0 comments6 min readLW link

Agency from a causal perspective

Jun 30, 2023, 5:37 PM
40 points
5 comments6 min readLW link

In­tro­duc­tion to Towards Causal Foun­da­tions of Safe AGI

Jun 12, 2023, 5:55 PM
67 points
6 comments4 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermottMay 15, 2023, 4:09 PM
62 points
1 comment13 min readLW link

Towards Mea­sures of Optimisation

May 12, 2023, 3:29 PM
53 points
37 comments4 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermottFeb 2, 2023, 8:28 PM
26 points
5 comments4 min readLW link