Here are two ways to relate to formality. Approach 1: this formal system is much less useful for thinking about the phenomenon than our intuitive understanding, but we should keep developing it anyway because eventually it may overtake our intuitive understanding.
Approach 2: by formalising our intuitive understanding, we have already improved it. When we make arguments about the phenomenon, using concepts from the formalism is better than using our intuitive concepts.
I have no problem with the approach 1; most formalisms start off bad, and get better over time. But it seems like a lot of people around here are taking the latter approach, and believe that the formalism of utility theory should be the primary lens by which we think about the goals of AGIs.
I’m not sure if you defend the latter. If you do, then it’s not sufficient to say that utility theory adds formalism, you also need to explain why that formalism is net positive for our understanding. When you’re talking about complex systems, there are plenty of ways that formalisms can harm our understanding. E.g. I’d say behaviourism in psychology was more formal and also less correct than intuitive psychology. So even though it made a bunch of contributions to our understanding of RL, which have been very useful, at the time people should have thought of it using approach 1 not approach 2. I think of utility theory in a similar way to how I think of behaviourism: it’s a useful supplementary lens to see things through, but (currently) highly misleading as a main lens to see things like AI risk arguments through.
If I thought “goals” were a better way of thinking than “utility functions”, I would probably be working on formalizing goal theory.
See my point above. You can believe that “goals” are a better way of thinking than “utility functions” while still believing that working on utility functions is more valuable. (Indeed, “utility functions” seem to be what “formalising goal theory” looks like!)
Utility theory, on the other hand, can still be saved
Oh, cool. I haven’t thought enough about the Jeffrey-Bolker approach enough to engage with it here, but I’ll tentatively withdraw this objection in the context of utility theory.
From a descriptive perspective, relativity suggests that agents won’t convergently think in states, because doing so doesn’t reflect the world perfectly.
I still strongly disagree (with what I think you’re saying). There are lots of different problems which agents will need to think about. Some of these problems (which involve relativity) are more physically fundamental. But that doesn’t mean that the types of thinking which help solve them need to be more mentally fundamental to our agents. Our thinking doesn’t reflect relativity very well (especially on the intuitive level which shapes our goals the most), but we manage to reason about it alright at a high level. Instead, our thinking is shaped most to be useful for the types of problems we tend to encounter at human scales; and we should expect our agents to also converge to thinking in whatever way is most useful for the majority of problems which they face, which likely won’t involve relativity much.
(I think this argument also informs our disagreement about the normative claim, but that seems like a trickier one to dig into, so I’ll skip it for now.)
Here are two ways to relate to formality. Approach 1: this formal system is much less useful for thinking about the phenomenon than our intuitive understanding, but we should keep developing it anyway because eventually it may overtake our intuitive understanding.
Approach 2: by formalising our intuitive understanding, we have already improved it. When we make arguments about the phenomenon, using concepts from the formalism is better than using our intuitive concepts.
I have no problem with the approach 1; most formalisms start off bad, and get better over time. But it seems like a lot of people around here are taking the latter approach, and believe that the formalism of utility theory should be the primary lens by which we think about the goals of AGIs.
I’m not sure if you defend the latter. If you do, then it’s not sufficient to say that utility theory adds formalism, you also need to explain why that formalism is net positive for our understanding. When you’re talking about complex systems, there are plenty of ways that formalisms can harm our understanding. E.g. I’d say behaviourism in psychology was more formal and also less correct than intuitive psychology. So even though it made a bunch of contributions to our understanding of RL, which have been very useful, at the time people should have thought of it using approach 1 not approach 2. I think of utility theory in a similar way to how I think of behaviourism: it’s a useful supplementary lens to see things through, but (currently) highly misleading as a main lens to see things like AI risk arguments through.
See my point above. You can believe that “goals” are a better way of thinking than “utility functions” while still believing that working on utility functions is more valuable. (Indeed, “utility functions” seem to be what “formalising goal theory” looks like!)
Oh, cool. I haven’t thought enough about the Jeffrey-Bolker approach enough to engage with it here, but I’ll tentatively withdraw this objection in the context of utility theory.
I still strongly disagree (with what I think you’re saying). There are lots of different problems which agents will need to think about. Some of these problems (which involve relativity) are more physically fundamental. But that doesn’t mean that the types of thinking which help solve them need to be more mentally fundamental to our agents. Our thinking doesn’t reflect relativity very well (especially on the intuitive level which shapes our goals the most), but we manage to reason about it alright at a high level. Instead, our thinking is shaped most to be useful for the types of problems we tend to encounter at human scales; and we should expect our agents to also converge to thinking in whatever way is most useful for the majority of problems which they face, which likely won’t involve relativity much.
(I think this argument also informs our disagreement about the normative claim, but that seems like a trickier one to dig into, so I’ll skip it for now.)