Vanessa Kosoy comments on AGI safety from first principles: Goals and Agency

Vanessa Kosoy 30 Sep 2020 10:47 UTC
LW: 1 AF: 1
AF

By contrast, in this section I’m interested in what it means for an agent to have a goal of its own. Three existing frameworks which attempt to answer this question are Von Neumann and Morgenstern’s expected utility maximisation, Daniel Dennett’s intentional stance, and Hubinger et al’s mesa-optimisation. I don’t think any of them adequately characterises the type of goal-directed behaviour we want to understand, though. While we can prove elegant theoretical results about utility functions, they are such a broad formalism that practically any behaviour can be described as maximising some utility function.

There is my algorithmic-theoretic definition which might be regarded as a formalization of the intentional stance, and which avoids the degeneracy problem you mentioned.