relativity tells us that a simple “state” abstraction isn’t quite right
Hmm, this sentence feels to me like a type error. It doesn’t seem like the way we reason about agents should depend on the fundamental laws of physics. If agents think in terms of states, then our model of agent goals should involve states regardless of whether that maps onto physics. (Another way of saying this is that agents are at a much higher level of abstraction than relativity.)
I don’t like reward functions, since that implies observability (at least usually it’s taken that way).
Hmm, you mean that reward is taken as observable? Yeah, this does seem like a significant drawback of talking about rewards. But if we assume that rewards are unobservable, I don’t see why reward functions aren’t expressive enough to encode utilitarianism—just let the reward at each timestep be net happiness at that timestep. Then we can describe utilitarians as trying to maximise reward.
I expect “simple goals and simple world-models” is going to generalize better than “simple policies”.
I think we’re talking about different debates here. I agree with the statement above—but the follow-up debate which I’m interested in is the comparison is “utility theory” versus “a naive conception of goals and beliefs” (in philosophical parlance, the folk theory), and so this actually seems like a point in favour of the latter. What does utility theory add to the folk theory of agency? Here’s one example: utility theory says that deontological goals are very complicated. To me, it seems like folk theory wins this one, because lots of people have pretty deontological goals. Or another example: utility theory says that there’s a single type of entity to which we assign value. Folk theory doesn’t have a type system for goals, and again that seems more accurate to me (we have meta-goals, etc).
To be clear, I do think that there are a bunch of things which the folk theory misses (mostly to do with probabilistic reasoning) and which utility theory highlights. But on the fundamental question of the content of goals (e.g. will they be more like “actually obey humans” or “tile the universe with tiny humans saying ‘good job’”) I’m not sure how much utility theory adds.
Hmm, this sentence feels to me like a type error. It doesn’t seem like the way we reason about agents should depend on the fundamental laws of physics. If agents think in terms of states, then our model of agent goals should involve states regardless of whether that maps onto physics. (Another way of saying this is that agents are at a much higher level of abstraction than relativity.)
True, but states aren’t at a much higher level of abstraction than relativity… states are a way to organize a world-model, and a world-model is a way of understanding the world.
From a normative perspective, relativity suggests that there’s ultimately going to be something wrong with designing agents to think in states; states make specific assumptions about time which turn out to be restrictive.
From a descriptive perspective, relativity suggests that agents won’t convergently think in states, because doing so doesn’t reflect the world perfectly.
The way we think about agents shouldn’t depend on how we think about physics, but it accidentally did, in that we accidentally baked linear time into some agent designs. So the reason relativity is able to say something about agent design, here, is because it points out that some agent designs are needlessly restrictive, and rational agents can take more general forms (and probably should).
This is not an argument against an agent carrying internal state, just an argument against using POMDPs to model everything.
Also, it’s pedantic; if you give me an agent model in the POMDP framework, there are probably more interesting things to talk about than whether it should be in the POMDP framework. But I would complain if POMDPs were a central assumption needed to prove a significant claim about rational agents, or something like that. (To give an extreme example, if someone used POMDP-agents to argue against the rationality of assenting to relativity.)
Hmm, you mean that reward is taken as observable? Yeah, this does seem like a significant drawback of talking about rewards. But if we assume that rewards are unobservable, I don’t see why reward functions aren’t expressive enough to encode utilitarianism—just let the reward at each timestep be net happiness at that timestep. Then we can describe utilitarians as trying to maximise reward.
I would complain significantly less about this, yeah. However, the relativity objection stands.
I think we’re talking about different debates here. I agree with the statement above—but the follow-up debate which I’m interested in is the comparison is “utility theory” versus “a naive conception of goals and beliefs” (in philosophical parlance, the folk theory), and so this actually seems like a point in favour of the latter. What does utility theory add to the folk theory of agency?
To state the obvious, it adds formality. For formal treatments, there isn’t much of a competition between naive goals and utility theory: utility theory wins by default, because naive goal theory doesn’t show up to the debate.
If I thought “goals” were a better way of thinking than “utility functions”, I would probably be working on formalizing goal theory. In reality, though, I think utility theory is essentially what you get when you try to do this.
Here’s one example: utility theory says that deontological goals are very complicated. To me, it seems like folk theory wins this one, because lots of people have pretty deontological goals.
So, my theory is not that it is always better to describe realistic agents as pursuing (simple) goals. Rather, I think it is often better to describe realistic agents as following simple policies. It’s just that simple utility functions are often enough a good explanation, that I want to also think in those terms.
Deontological ethics tags actions as good and bad, so, it’s essentially about policy. So, the descriptive utility follows from the usefulness of the policy view. [The normative utility is less obvious, but, there are several reasons why this can be normatively useful; eg, it’s easier to copy than consequentialist ethics, it’s easier to trust deontological agents (they’re more predictable), etc.]
To state it a little more thoroughly:
A good first approximation is the prior where agents have simple policies. (This is basically treating agents as regular objects, and investigating the behavior of those objects.)
Many cases where that does not work well are handled much better by assuming simple utility functions and simple beliefs. So, it is useful to sloppily combine the two.
An even better combination of the two conceives of an agent as a model-based learner who is optimizing a policy. This combines policy simplicity with utility simplicity in a sophisticated way. Of course, even better models are also possible.
Or another example: utility theory says that there’s a single type of entity to which we assign value. Folk theory doesn’t have a type system for goals, and again that seems more accurate to me (we have meta-goals, etc).
I’m not sure what you mean, but I suspect I just agree with this point. Utility functions are bad because they require an input type such as “worlds”. Utility theory, on the other hand, can still be saved, by considering expectation functions (which can measure the expectation of arbitrary propositions, linear combinations of propositions, etc). This allows us to talk about meta-goals as expectations-of-goals (“I don’t think I should want pizza”).
To be clear, I do think that there are a bunch of things which the folk theory misses (mostly to do with probabilistic reasoning) and which utility theory highlights. But on the fundamental question of the content of goals (e.g. will they be more like “actually obey humans” or “tile the universe with tiny humans saying ‘good job’”) I’m not sure how much utility theory adds.
Again, it would seem to add formality, which seems pretty useful.
Here are two ways to relate to formality. Approach 1: this formal system is much less useful for thinking about the phenomenon than our intuitive understanding, but we should keep developing it anyway because eventually it may overtake our intuitive understanding.
Approach 2: by formalising our intuitive understanding, we have already improved it. When we make arguments about the phenomenon, using concepts from the formalism is better than using our intuitive concepts.
I have no problem with the approach 1; most formalisms start off bad, and get better over time. But it seems like a lot of people around here are taking the latter approach, and believe that the formalism of utility theory should be the primary lens by which we think about the goals of AGIs.
I’m not sure if you defend the latter. If you do, then it’s not sufficient to say that utility theory adds formalism, you also need to explain why that formalism is net positive for our understanding. When you’re talking about complex systems, there are plenty of ways that formalisms can harm our understanding. E.g. I’d say behaviourism in psychology was more formal and also less correct than intuitive psychology. So even though it made a bunch of contributions to our understanding of RL, which have been very useful, at the time people should have thought of it using approach 1 not approach 2. I think of utility theory in a similar way to how I think of behaviourism: it’s a useful supplementary lens to see things through, but (currently) highly misleading as a main lens to see things like AI risk arguments through.
If I thought “goals” were a better way of thinking than “utility functions”, I would probably be working on formalizing goal theory.
See my point above. You can believe that “goals” are a better way of thinking than “utility functions” while still believing that working on utility functions is more valuable. (Indeed, “utility functions” seem to be what “formalising goal theory” looks like!)
Utility theory, on the other hand, can still be saved
Oh, cool. I haven’t thought enough about the Jeffrey-Bolker approach enough to engage with it here, but I’ll tentatively withdraw this objection in the context of utility theory.
From a descriptive perspective, relativity suggests that agents won’t convergently think in states, because doing so doesn’t reflect the world perfectly.
I still strongly disagree (with what I think you’re saying). There are lots of different problems which agents will need to think about. Some of these problems (which involve relativity) are more physically fundamental. But that doesn’t mean that the types of thinking which help solve them need to be more mentally fundamental to our agents. Our thinking doesn’t reflect relativity very well (especially on the intuitive level which shapes our goals the most), but we manage to reason about it alright at a high level. Instead, our thinking is shaped most to be useful for the types of problems we tend to encounter at human scales; and we should expect our agents to also converge to thinking in whatever way is most useful for the majority of problems which they face, which likely won’t involve relativity much.
(I think this argument also informs our disagreement about the normative claim, but that seems like a trickier one to dig into, so I’ll skip it for now.)
If agents think in terms of states, then our model of agent goals should involve states regardless of whether that maps onto physics.
Realistic agents don’t have the option of thinking in terms of detailed world states anyway, so the relativistic objection is the least of their worries.
Hmm, this sentence feels to me like a type error. It doesn’t seem like the way we reason about agents should depend on the fundamental laws of physics. If agents think in terms of states, then our model of agent goals should involve states regardless of whether that maps onto physics. (Another way of saying this is that agents are at a much higher level of abstraction than relativity.)
Hmm, you mean that reward is taken as observable? Yeah, this does seem like a significant drawback of talking about rewards. But if we assume that rewards are unobservable, I don’t see why reward functions aren’t expressive enough to encode utilitarianism—just let the reward at each timestep be net happiness at that timestep. Then we can describe utilitarians as trying to maximise reward.
I think we’re talking about different debates here. I agree with the statement above—but the follow-up debate which I’m interested in is the comparison is “utility theory” versus “a naive conception of goals and beliefs” (in philosophical parlance, the folk theory), and so this actually seems like a point in favour of the latter. What does utility theory add to the folk theory of agency? Here’s one example: utility theory says that deontological goals are very complicated. To me, it seems like folk theory wins this one, because lots of people have pretty deontological goals. Or another example: utility theory says that there’s a single type of entity to which we assign value. Folk theory doesn’t have a type system for goals, and again that seems more accurate to me (we have meta-goals, etc).
To be clear, I do think that there are a bunch of things which the folk theory misses (mostly to do with probabilistic reasoning) and which utility theory highlights. But on the fundamental question of the content of goals (e.g. will they be more like “actually obey humans” or “tile the universe with tiny humans saying ‘good job’”) I’m not sure how much utility theory adds.
True, but states aren’t at a much higher level of abstraction than relativity… states are a way to organize a world-model, and a world-model is a way of understanding the world.
From a normative perspective, relativity suggests that there’s ultimately going to be something wrong with designing agents to think in states; states make specific assumptions about time which turn out to be restrictive.
From a descriptive perspective, relativity suggests that agents won’t convergently think in states, because doing so doesn’t reflect the world perfectly.
The way we think about agents shouldn’t depend on how we think about physics, but it accidentally did, in that we accidentally baked linear time into some agent designs. So the reason relativity is able to say something about agent design, here, is because it points out that some agent designs are needlessly restrictive, and rational agents can take more general forms (and probably should).
This is not an argument against an agent carrying internal state, just an argument against using POMDPs to model everything.
Also, it’s pedantic; if you give me an agent model in the POMDP framework, there are probably more interesting things to talk about than whether it should be in the POMDP framework. But I would complain if POMDPs were a central assumption needed to prove a significant claim about rational agents, or something like that. (To give an extreme example, if someone used POMDP-agents to argue against the rationality of assenting to relativity.)
I would complain significantly less about this, yeah. However, the relativity objection stands.
To state the obvious, it adds formality. For formal treatments, there isn’t much of a competition between naive goals and utility theory: utility theory wins by default, because naive goal theory doesn’t show up to the debate.
If I thought “goals” were a better way of thinking than “utility functions”, I would probably be working on formalizing goal theory. In reality, though, I think utility theory is essentially what you get when you try to do this.
So, my theory is not that it is always better to describe realistic agents as pursuing (simple) goals. Rather, I think it is often better to describe realistic agents as following simple policies. It’s just that simple utility functions are often enough a good explanation, that I want to also think in those terms.
Deontological ethics tags actions as good and bad, so, it’s essentially about policy. So, the descriptive utility follows from the usefulness of the policy view. [The normative utility is less obvious, but, there are several reasons why this can be normatively useful; eg, it’s easier to copy than consequentialist ethics, it’s easier to trust deontological agents (they’re more predictable), etc.]
To state it a little more thoroughly:
A good first approximation is the prior where agents have simple policies. (This is basically treating agents as regular objects, and investigating the behavior of those objects.)
Many cases where that does not work well are handled much better by assuming simple utility functions and simple beliefs. So, it is useful to sloppily combine the two.
An even better combination of the two conceives of an agent as a model-based learner who is optimizing a policy. This combines policy simplicity with utility simplicity in a sophisticated way. Of course, even better models are also possible.
I’m not sure what you mean, but I suspect I just agree with this point. Utility functions are bad because they require an input type such as “worlds”. Utility theory, on the other hand, can still be saved, by considering expectation functions (which can measure the expectation of arbitrary propositions, linear combinations of propositions, etc). This allows us to talk about meta-goals as expectations-of-goals (“I don’t think I should want pizza”).
Again, it would seem to add formality, which seems pretty useful.
Here are two ways to relate to formality. Approach 1: this formal system is much less useful for thinking about the phenomenon than our intuitive understanding, but we should keep developing it anyway because eventually it may overtake our intuitive understanding.
Approach 2: by formalising our intuitive understanding, we have already improved it. When we make arguments about the phenomenon, using concepts from the formalism is better than using our intuitive concepts.
I have no problem with the approach 1; most formalisms start off bad, and get better over time. But it seems like a lot of people around here are taking the latter approach, and believe that the formalism of utility theory should be the primary lens by which we think about the goals of AGIs.
I’m not sure if you defend the latter. If you do, then it’s not sufficient to say that utility theory adds formalism, you also need to explain why that formalism is net positive for our understanding. When you’re talking about complex systems, there are plenty of ways that formalisms can harm our understanding. E.g. I’d say behaviourism in psychology was more formal and also less correct than intuitive psychology. So even though it made a bunch of contributions to our understanding of RL, which have been very useful, at the time people should have thought of it using approach 1 not approach 2. I think of utility theory in a similar way to how I think of behaviourism: it’s a useful supplementary lens to see things through, but (currently) highly misleading as a main lens to see things like AI risk arguments through.
See my point above. You can believe that “goals” are a better way of thinking than “utility functions” while still believing that working on utility functions is more valuable. (Indeed, “utility functions” seem to be what “formalising goal theory” looks like!)
Oh, cool. I haven’t thought enough about the Jeffrey-Bolker approach enough to engage with it here, but I’ll tentatively withdraw this objection in the context of utility theory.
I still strongly disagree (with what I think you’re saying). There are lots of different problems which agents will need to think about. Some of these problems (which involve relativity) are more physically fundamental. But that doesn’t mean that the types of thinking which help solve them need to be more mentally fundamental to our agents. Our thinking doesn’t reflect relativity very well (especially on the intuitive level which shapes our goals the most), but we manage to reason about it alright at a high level. Instead, our thinking is shaped most to be useful for the types of problems we tend to encounter at human scales; and we should expect our agents to also converge to thinking in whatever way is most useful for the majority of problems which they face, which likely won’t involve relativity much.
(I think this argument also informs our disagreement about the normative claim, but that seems like a trickier one to dig into, so I’ll skip it for now.)
Realistic agents don’t have the option of thinking in terms of detailed world states anyway, so the relativistic objection is the least of their worries.