Some nitpicks about your risk model slash ways in which my risk model differs from yours:
1. I think AIs are more likely to be more homogenous on Earth; even in a slow takeoff they might be all rather similar to each other. Partly for the reasons Evan discusses in his post, and partly because of acausal shenanigans. I certainly think that, unfortunately, given all the problems you describe, we should count ourselves lucky if any of the contending AI factions are aligned to our values. I think this is an important research area.
2. I am perhaps more optimistic than you that if at least one of the contending AI factions is aligned to our values, things will work out pretty well for us. I’m hopeful that the AI factions will negotiate and compromise rather than fight. (Even though, as you point out, the unaligned ones may have various advantages) I feel about 80% confident, how about you? I’d love to hear more about this, I think it’s an important research area.
3. You speak of AGI’s escaping into the wild. I think that’s a possibility, but I’m somewhat more concerned about AGI’s taking over the institutions that built them. Institutions that build AIs will presumably be trying to use them for some intellectual purpose, and presumably be somewhat optimistic (too optimistic, IMO) that the AI they built is aligned to them. So rather than escaping and setting up shop on some hacked server somewhere, I expect the most likely scenario to be something like “The AI is engaging and witty and sympathetic and charismatic, and behaves very nicely, and gradually the institution that built it comes to rely more and more on its suggestions and trust it more and more, until eventually it is the power behind the throne basically, steering the entire institution even as those inside think that they are still in charge.”
For homogeneity, I guess I was mainly thinking that in the era of not-knowing-how-to-align-an-AGI, people would tend to try lots of different new things, because nothing so far has worked. I agree that once there’s an aligned AGI, it’s likely to get copied, and if new better AGIs are trained, people may be inclined to try to keep the procedure as close as possible to what’s worked before.
I hadn’t thought about whether different AGIs with different goals are likely to compromise vs fight. There’s Wei Dai’s argument that compromise is very easy with AGIs because they can “merge their utility functions”. But at least this kind of AGI doesn’t have a utility function … maybe there’s a way to do something like that with multiple parallel value functions, but I’m not sure that would actually work. There are also old posts about AGIs checking each other’s source code for sincerity, but can they actually understand what they’re looking at? Transparency is hard. And how do they verify that there isn’t a backup stashed somewhere else, ready to jump out at a later date and betray the agreement? Also, humans have social instincts that AGIs don’t, which pushes in both directions I think. And humans are easier to kill / easier to credibly threaten. I dunno. I’m not inclined to have confidence in any direction.
I agree that if a sufficiently smart misaligned AGI is running on a nice supercomputer somewhere, it would have every reason to try to stay right there and pursue its goals within that institution, and it would have every reason to try to escape and self-replicate elsewhere in the world. I guess we can be concerned about both. :-/
So rather than escaping and setting up shop on some hacked server somewhere, I expect the most likely scenario to be something like “The AI is engaging and witty and sympathetic and charismatic [...]”
(I’m new to thinking about this and would find responses and pointers really helpful) In my head this scenario felt unrealistic because I expect transformative-ish AI applications to come up before highly sophisticated AIs start socially manipulating their designers. Just for the sake of illustrating, I was thinking of stuff like stock investment AIs, product design AIs, military strategy AIs, companionship AIs, question answering AIs, which all seem to have the potential to throw major curves. Associated incidences would update safety culture enough to make the classic “AGI arguing itself out of a box” scenario unlikely. So I would worry more about scenarios were companies or governments feel like their hands are tied in allowing usage of/relying on potentially transformative AI systems.
I think this is a very important and neglected area of research. My take differs from yours but I’m very unconfident in it, you might be right. I’m glad you are thinking about this and would love to chat more about it with you.
Stock investment AIs seem like they would make lots of money, which would accelerate timelines by causing loads more money to be spent on AI. But other than that, they don’t seem that relevant? Like, how could they cause a point of no return?
Product design AIs and question-answering AIs seem similar. Maybe they’ll accelerate timelines, but other than that, they won’t be causing a point of no return (unless they have gotten so generally intelligent that they can start strategically manipulating us via their products and questions, which I think would happen eventually but by the time that happens there will probably be agenty AIs running around too)
Companionship AIs seem like the sort of thing that would be engaging and witty and charismatic, or at the very least, insofar as companionship AIs become a big deal, AIs that can argue themselves out of the box aren’t close behind.
Military strategy AIs seem similar to me if they can talk/understand language (convincing people of things is something you can strategize about too). Maybe we can imagine a kind of military strategy AI that doesn’t really do language well, maybe instead it just has really good battle simulators and has generalized tactical skill that lets it issue commands to troops that are likely to win battles. But (a) I think this is unlikely, and (b) I think it isn’t super relevant anyway since tactical skill isn’t very important anyway. It’s not like we are currently fighting a conventional war and better front-line tactics will let us break through the line or something.
Some nitpicks about your risk model slash ways in which my risk model differs from yours:
1. I think AIs are more likely to be more homogenous on Earth; even in a slow takeoff they might be all rather similar to each other. Partly for the reasons Evan discusses in his post, and partly because of acausal shenanigans. I certainly think that, unfortunately, given all the problems you describe, we should count ourselves lucky if any of the contending AI factions are aligned to our values. I think this is an important research area.
2. I am perhaps more optimistic than you that if at least one of the contending AI factions is aligned to our values, things will work out pretty well for us. I’m hopeful that the AI factions will negotiate and compromise rather than fight. (Even though, as you point out, the unaligned ones may have various advantages) I feel about 80% confident, how about you? I’d love to hear more about this, I think it’s an important research area.
3. You speak of AGI’s escaping into the wild. I think that’s a possibility, but I’m somewhat more concerned about AGI’s taking over the institutions that built them. Institutions that build AIs will presumably be trying to use them for some intellectual purpose, and presumably be somewhat optimistic (too optimistic, IMO) that the AI they built is aligned to them. So rather than escaping and setting up shop on some hacked server somewhere, I expect the most likely scenario to be something like “The AI is engaging and witty and sympathetic and charismatic, and behaves very nicely, and gradually the institution that built it comes to rely more and more on its suggestions and trust it more and more, until eventually it is the power behind the throne basically, steering the entire institution even as those inside think that they are still in charge.”
Thanks!
For homogeneity, I guess I was mainly thinking that in the era of not-knowing-how-to-align-an-AGI, people would tend to try lots of different new things, because nothing so far has worked. I agree that once there’s an aligned AGI, it’s likely to get copied, and if new better AGIs are trained, people may be inclined to try to keep the procedure as close as possible to what’s worked before.
I hadn’t thought about whether different AGIs with different goals are likely to compromise vs fight. There’s Wei Dai’s argument that compromise is very easy with AGIs because they can “merge their utility functions”. But at least this kind of AGI doesn’t have a utility function … maybe there’s a way to do something like that with multiple parallel value functions, but I’m not sure that would actually work. There are also old posts about AGIs checking each other’s source code for sincerity, but can they actually understand what they’re looking at? Transparency is hard. And how do they verify that there isn’t a backup stashed somewhere else, ready to jump out at a later date and betray the agreement? Also, humans have social instincts that AGIs don’t, which pushes in both directions I think. And humans are easier to kill / easier to credibly threaten. I dunno. I’m not inclined to have confidence in any direction.
I agree that if a sufficiently smart misaligned AGI is running on a nice supercomputer somewhere, it would have every reason to try to stay right there and pursue its goals within that institution, and it would have every reason to try to escape and self-replicate elsewhere in the world. I guess we can be concerned about both. :-/
(I’m new to thinking about this and would find responses and pointers really helpful) In my head this scenario felt unrealistic because I expect transformative-ish AI applications to come up before highly sophisticated AIs start socially manipulating their designers. Just for the sake of illustrating, I was thinking of stuff like stock investment AIs, product design AIs, military strategy AIs, companionship AIs, question answering AIs, which all seem to have the potential to throw major curves. Associated incidences would update safety culture enough to make the classic “AGI arguing itself out of a box” scenario unlikely. So I would worry more about scenarios were companies or governments feel like their hands are tied in allowing usage of/relying on potentially transformative AI systems.
I think this is a very important and neglected area of research. My take differs from yours but I’m very unconfident in it, you might be right. I’m glad you are thinking about this and would love to chat more about it with you.
Stock investment AIs seem like they would make lots of money, which would accelerate timelines by causing loads more money to be spent on AI. But other than that, they don’t seem that relevant? Like, how could they cause a point of no return?
Product design AIs and question-answering AIs seem similar. Maybe they’ll accelerate timelines, but other than that, they won’t be causing a point of no return (unless they have gotten so generally intelligent that they can start strategically manipulating us via their products and questions, which I think would happen eventually but by the time that happens there will probably be agenty AIs running around too)
Companionship AIs seem like the sort of thing that would be engaging and witty and charismatic, or at the very least, insofar as companionship AIs become a big deal, AIs that can argue themselves out of the box aren’t close behind.
Military strategy AIs seem similar to me if they can talk/understand language (convincing people of things is something you can strategize about too). Maybe we can imagine a kind of military strategy AI that doesn’t really do language well, maybe instead it just has really good battle simulators and has generalized tactical skill that lets it issue commands to troops that are likely to win battles. But (a) I think this is unlikely, and (b) I think it isn’t super relevant anyway since tactical skill isn’t very important anyway. It’s not like we are currently fighting a conventional war and better front-line tactics will let us break through the line or something.
This seems very relevant: https://www.gwern.net/Tool-AI