Interestingly, I agree with you that philosophers could make important contributions to AI safety, but my list of things that I’d want them to investigate is almost entirely disjoint from yours.
The most important points on my list:
1. Investigating how to think about agency and goal-directed behaviour, along the lines of Dennett’s work on the intentional stance. How do they relate to intelligence and the ability to generalise across widely different domains? These are crucial concepts which are still very vague.
2. Lay out the arguments for AGI being dangerous as rigorously and comprehensively as possible, noting the assumptions which are being made and how plausible they are.
3. Evaluating the assumptions about the decomposability of cognitive work which underlie debate and IDA (in particular: the universality of humans, and the role of language).
I should have written something to encourage others to list their own problems that philosophers can contribute to, but was in a hurry to move my existing comment over to this forum and forgot, so thanks for doing that anyway.
About your list, I agree with 3 (and was planning to add something like it as a bullet point under “metaphilosophy” but again forgot).
For 1, I’m not sure there’s much for philosophers to do there, which may in part be because I don’t correctly understand Dennett’s work on the intentional stance. I wonder if you can explain more or point to an explanation of Dennett’s work that would make sense for someone like me? (Is he saying anything beyond that it’s sometimes useful to make a prediction by treating something as if it’s a rational agent, which by itself seems like a rather trivial point?)
For 2, it seems like rather than being disjoint from my list, it actually subsumes a number of items on my list. For example, one reason I think AGI will likely be dangerous is that it will likely differentially accelerate other forms of intellectual progress relative to philosophical progress. So wouldn’t laying out this argument “as rigorously and comprehensively as possible” involve making substantial progress on that item on my list?
On 1: I think there’s a huge amount for philosophers to do. I think of Dennett as laying some of the groundwork which will make the rest of that work easier (such as identifying that the key question is when it’s useful to use an intentional stance, rather than trying to figure out which things are objectively “agents”) but the key details are still very vague. Maybe the crux of our disagreement here is how well-specified “treating something as if it’s a rational agent” actually is. I think that definitions in terms of utility functions just aren’t very helpful, and so we need more conceptual analysis of what phrases like this actually mean, which philosophers are best-suited to provide.
On 2: you’re right, as written it does subsume parts of your list. I guess when I wrote that I was thinking that most of the value would come from clarification of the most well-known arguments (i.e. the ones laid out in Superintelligence and What Failure Looks Like). I endorse philosophers pursuing all the items on your list, but from my perspective the disjoint items on my list are much higher priorities.
I think that definitions in terms of utility functions just aren’t very helpful
I agree this seems like a good candidate for our crux. It seems to me that defining “rational agent” in terms of “utility function” is both intuitively and theoretically quite appealing and really useful in practice (see the whole field of economics), and I’m pretty puzzled by your persistent belief that maybe we can do much better.
The recent paper Designing agent incentives to avoid reward tampering also seems relevant here, as it gives a seemingly clear explanation of why if you started with an RL agent, you might want to move to a decision theoretic agent (i.e., something that has a utility function) instead. I wonder if that changes your mind at all.
I wonder if your intuition is along the lines of, an agent is a computational process that is trying to accomplish some real world objective, where “trying” etc. are all cashed out in a lot more detail (similar to how it would need to be cashed out to clarify Paul’s notion of intent alignment). If so, I think I’m sympathetic to this.
Interestingly, I agree with you that philosophers could make important contributions to AI safety, but my list of things that I’d want them to investigate is almost entirely disjoint from yours.
The most important points on my list:
1. Investigating how to think about agency and goal-directed behaviour, along the lines of Dennett’s work on the intentional stance. How do they relate to intelligence and the ability to generalise across widely different domains? These are crucial concepts which are still very vague.
2. Lay out the arguments for AGI being dangerous as rigorously and comprehensively as possible, noting the assumptions which are being made and how plausible they are.
3. Evaluating the assumptions about the decomposability of cognitive work which underlie debate and IDA (in particular: the universality of humans, and the role of language).
I should have written something to encourage others to list their own problems that philosophers can contribute to, but was in a hurry to move my existing comment over to this forum and forgot, so thanks for doing that anyway.
About your list, I agree with 3 (and was planning to add something like it as a bullet point under “metaphilosophy” but again forgot).
For 1, I’m not sure there’s much for philosophers to do there, which may in part be because I don’t correctly understand Dennett’s work on the intentional stance. I wonder if you can explain more or point to an explanation of Dennett’s work that would make sense for someone like me? (Is he saying anything beyond that it’s sometimes useful to make a prediction by treating something as if it’s a rational agent, which by itself seems like a rather trivial point?)
For 2, it seems like rather than being disjoint from my list, it actually subsumes a number of items on my list. For example, one reason I think AGI will likely be dangerous is that it will likely differentially accelerate other forms of intellectual progress relative to philosophical progress. So wouldn’t laying out this argument “as rigorously and comprehensively as possible” involve making substantial progress on that item on my list?
On 1: I think there’s a huge amount for philosophers to do. I think of Dennett as laying some of the groundwork which will make the rest of that work easier (such as identifying that the key question is when it’s useful to use an intentional stance, rather than trying to figure out which things are objectively “agents”) but the key details are still very vague. Maybe the crux of our disagreement here is how well-specified “treating something as if it’s a rational agent” actually is. I think that definitions in terms of utility functions just aren’t very helpful, and so we need more conceptual analysis of what phrases like this actually mean, which philosophers are best-suited to provide.
On 2: you’re right, as written it does subsume parts of your list. I guess when I wrote that I was thinking that most of the value would come from clarification of the most well-known arguments (i.e. the ones laid out in Superintelligence and What Failure Looks Like). I endorse philosophers pursuing all the items on your list, but from my perspective the disjoint items on my list are much higher priorities.
I agree this seems like a good candidate for our crux. It seems to me that defining “rational agent” in terms of “utility function” is both intuitively and theoretically quite appealing and really useful in practice (see the whole field of economics), and I’m pretty puzzled by your persistent belief that maybe we can do much better.
AFAICT, the main argument for your position is Coherent behaviour in the real world is an incoherent concept but I feel like I gave a strong counter-argument against it and I’m not sure what your counter-counter-argument is.
The recent paper Designing agent incentives to avoid reward tampering also seems relevant here, as it gives a seemingly clear explanation of why if you started with an RL agent, you might want to move to a decision theoretic agent (i.e., something that has a utility function) instead. I wonder if that changes your mind at all.
I wonder if your intuition is along the lines of, an agent is a computational process that is trying to accomplish some real world objective, where “trying” etc. are all cashed out in a lot more detail (similar to how it would need to be cashed out to clarify Paul’s notion of intent alignment). If so, I think I’m sympathetic to this.