RSS

mattmacdermott

Karma: 1,261

Is in­stru­men­tal con­ver­gence a thing for virtue-driven agents?

mattmacdermott2 Apr 2025 3:59 UTC
33 points
37 comments2 min readLW link

Val­i­dat­ing against a mis­al­ign­ment de­tec­tor is very differ­ent to train­ing against one

mattmacdermott4 Mar 2025 15:41 UTC
39 points
4 comments4 min readLW link

Su­per­in­tel­li­gent Agents Pose Catas­trophic Risks: Can Scien­tist AI Offer a Safer Path?

24 Feb 2025 18:31 UTC
44 points
15 comments11 min readLW link

Con­text-de­pen­dent consequentialism

4 Nov 2024 9:29 UTC
31 points
6 comments27 min readLW link

Can a Bayesian Or­a­cle Prevent Harm from an Agent? (Ben­gio et al. 2024)

mattmacdermott1 Sep 2024 7:46 UTC
26 points
0 comments5 min readLW link
(yoshuabengio.org)

Ben­gio’s Align­ment Pro­posal: “Towards a Cau­tious Scien­tist AI with Con­ver­gent Safety Bounds”

mattmacdermott29 Feb 2024 13:59 UTC
76 points
19 comments14 min readLW link
(yoshuabengio.org)

mattmac­der­mott’s Shortform

mattmacdermott3 Jan 2024 9:08 UTC
4 points
36 commentsLW link

What’s next for the field of Agent Foun­da­tions?

30 Nov 2023 17:55 UTC
59 points
23 comments10 min readLW link

Op­ti­mi­sa­tion Mea­sures: Desider­ata, Im­pos­si­bil­ity, Proposals

7 Aug 2023 15:52 UTC
36 points
9 comments1 min readLW link

Re­ward Hack­ing from a Causal Perspective

21 Jul 2023 18:27 UTC
29 points
6 comments7 min readLW link

In­cen­tives from a causal perspective

10 Jul 2023 17:16 UTC
27 points
0 comments6 min readLW link

Agency from a causal perspective

30 Jun 2023 17:37 UTC
40 points
5 comments6 min readLW link

In­tro­duc­tion to Towards Causal Foun­da­tions of Safe AGI

12 Jun 2023 17:55 UTC
70 points
6 comments4 min readLW link

Some Sum­maries of Agent Foun­da­tions Work

mattmacdermott15 May 2023 16:09 UTC
62 points
1 comment13 min readLW link

Towards Mea­sures of Optimisation

12 May 2023 15:29 UTC
53 points
37 comments4 min readLW link

Nor­ma­tive vs De­scrip­tive Models of Agency

mattmacdermott2 Feb 2023 20:28 UTC
26 points
5 comments4 min readLW link