If these paths are viable, I desire to believe that they are viable.
If these paths are nonviable, I desire to believe that they are nonviable.
Does it do any good, to take well-meaning optimistic suggestions seriously, if they will in fact clearly not work? Obviously, if they will work, by all means we should discover that, because knowing which of those paths, if any, is the most likely to work is galactically important. But I don’t think they’ve been dismissed just because people thought the optimists needed to be taken down a peg. Reality does not owe us a reason for optimism.
Generally when people are optimistic about one of those paths, it is not because they’ve given it deep thought and think that this is a viable approach, it is because they are not aware of the voluminous debate and reasons to believe that it will not work, at all. And inasmuch as they insist on that path in the face of these arguments, it is often because they are lacking in security mindset—they are “looking for ways that things could work out”, without considering how plausible or actionable each step on that path would actually be. If that’s the mode they’re in, then I don’t see how encouraging their optimism will help the problem.
Is the argument that any effort spent on any of those paths is worthwhile compared to thinking that nothing can be done?
edit: Of course, misplaced pessimism is just as disastrous. And on rereading, was that your argument? Sorry if I reacted to something you didn’t say. If that’s the take, I agree fully. If one of those approaches is in fact viable, misplaced pessimism is just as destructive. I just think that the crux there is whether or not it is, in fact, viable—and how to discover that.
I have a reasonably low value for p(Doom). I also think these approaches (to the extent they are courses of action) are not really viable. However, as long as they don’t increase the probability of p(Doom) its fine to pursue them. Two important considerations here: an unviable approach may still slightly reduce p(Doom) or delay Doom and the resources used for unviable approaches don’t necessarily detract from the resources used for viable approaches.
For example, “we’ll pressure corporations to take these problems seriously”, while unviable as a solution will tend to marginally reduce the amount of money flowing to AI research, marginally increase the degree to which AI researchers have to consider AI risk and marginally enhance resources focused on AI risk. Resources used in pressuring corporations are unlikely to have any effect which increases AI risk. So, while this is unviable, in the absence of a viable strategy suggesting the pursuit of this seems slightly positive.
Resources used in pressuring corporations are unlikely to have any effect which increases AI risk.
Devil’s advocate: If this unevenly delays corporations sensitive to public concerns, and those are also corporations taking alignment at least somewhat seriously, we get a later but less safe takeoff. Though this goes for almost any intervention, including to some extent regulatory.
Yes. An example of how this could go disastrously wrong is if US research gets regulated but Chinese research continues apace, and China ends up winning the race with a particularly unsafe AGI.
I’m actually optimistic about prosaic alignment for a takeoff driven by language models. But I don’t know what the opportunity for action is there—I expect Deepmind to trigger the singularity, and they’re famously opaque. Call it 15% chance of not-doom, action or no action. To be clear, I think action is possible, but I don’t know who would do it or what form it would take. Convince OpenAI and race Deepmind to a working prototype? This is exactly the scenario we hoped to not be in...
edit: I think possibly step 1 is to prove that Transformers can scale to AGI. Find a list of remaining problems and knock them down—preferably in toy environments with weaker models. The difficulty is obviously demonstrating danger without instantiating it. Create a fire alarm, somehow. The hard part for judgment on this action is that it both helps and harms.
edit: Regulatory action may buy us a few years! I don’t see how we can get it though.
I’m definitely on board with prosaic alignment via language models. There’s a few different projects I’ve seen in this community related to that approach, including ELK and the project to teach a GPT-3-like model to produce violence-free stories. I definitely think these are good things.
I don’t understand why you would want to spend any effort proving that transformers could scale to AGI. Either they can or they can’t. If they can, then proving that they can will only accelerate the problem. If they can’t, then prosaic alignment will turn out to be a waste of time, but only in the sense that every lightbulb Edison tested was a waste of time (except for the one that worked). This is what I mean by a shotgun approach.
I don’t understand why you would want to spend any effort proving that transformers could scale to AGI.
The point would be to try and create common knowledge that they can. Otherwise, for any “we decided to not do X”, someone else will try doing X, and the problem remains.
Humanity is already taking a shotgun approach to unaligned AGI. Shotgunning safety is viable and important, but I think it’s more urgent to prevent the first shotgun from hitting an artery. Demonstrating AGI viability in this analogy is shotgunning a pig in the town square, to prove to everyone that the guns we are building can in fact kill.
We want safety to have as many pellets in flight as possible. But we want unaligned AGI to have as few pellets in flight as possible. (Preferably none.)
If these paths are viable, I desire to believe that they are viable.
If these paths are nonviable, I desire to believe that they are nonviable.
Does it do any good, to take well-meaning optimistic suggestions seriously, if they will in fact clearly not work? Obviously, if they will work, by all means we should discover that, because knowing which of those paths, if any, is the most likely to work is galactically important. But I don’t think they’ve been dismissed just because people thought the optimists needed to be taken down a peg. Reality does not owe us a reason for optimism.
Generally when people are optimistic about one of those paths, it is not because they’ve given it deep thought and think that this is a viable approach, it is because they are not aware of the voluminous debate and reasons to believe that it will not work, at all. And inasmuch as they insist on that path in the face of these arguments, it is often because they are lacking in security mindset—they are “looking for ways that things could work out”, without considering how plausible or actionable each step on that path would actually be. If that’s the mode they’re in, then I don’t see how encouraging their optimism will help the problem.
Is the argument that any effort spent on any of those paths is worthwhile compared to thinking that nothing can be done?
edit: Of course, misplaced pessimism is just as disastrous. And on rereading, was that your argument? Sorry if I reacted to something you didn’t say. If that’s the take, I agree fully. If one of those approaches is in fact viable, misplaced pessimism is just as destructive. I just think that the crux there is whether or not it is, in fact, viable—and how to discover that.
I have a reasonably low value for p(Doom). I also think these approaches (to the extent they are courses of action) are not really viable. However, as long as they don’t increase the probability of p(Doom) its fine to pursue them. Two important considerations here: an unviable approach may still slightly reduce p(Doom) or delay Doom and the resources used for unviable approaches don’t necessarily detract from the resources used for viable approaches.
For example, “we’ll pressure corporations to take these problems seriously”, while unviable as a solution will tend to marginally reduce the amount of money flowing to AI research, marginally increase the degree to which AI researchers have to consider AI risk and marginally enhance resources focused on AI risk. Resources used in pressuring corporations are unlikely to have any effect which increases AI risk. So, while this is unviable, in the absence of a viable strategy suggesting the pursuit of this seems slightly positive.
Devil’s advocate: If this unevenly delays corporations sensitive to public concerns, and those are also corporations taking alignment at least somewhat seriously, we get a later but less safe takeoff. Though this goes for almost any intervention, including to some extent regulatory.
Yes. An example of how this could go disastrously wrong is if US research gets regulated but Chinese research continues apace, and China ends up winning the race with a particularly unsafe AGI.
What is p(DOOM | Action), in your view?
I’m actually optimistic about prosaic alignment for a takeoff driven by language models. But I don’t know what the opportunity for action is there—I expect Deepmind to trigger the singularity, and they’re famously opaque. Call it 15% chance of not-doom, action or no action. To be clear, I think action is possible, but I don’t know who would do it or what form it would take. Convince OpenAI and race Deepmind to a working prototype? This is exactly the scenario we hoped to not be in...
edit: I think possibly step 1 is to prove that Transformers can scale to AGI. Find a list of remaining problems and knock them down—preferably in toy environments with weaker models. The difficulty is obviously demonstrating danger without instantiating it. Create a fire alarm, somehow. The hard part for judgment on this action is that it both helps and harms.
edit: Regulatory action may buy us a few years! I don’t see how we can get it though.
I’m definitely on board with prosaic alignment via language models. There’s a few different projects I’ve seen in this community related to that approach, including ELK and the project to teach a GPT-3-like model to produce violence-free stories. I definitely think these are good things.
I don’t understand why you would want to spend any effort proving that transformers could scale to AGI. Either they can or they can’t. If they can, then proving that they can will only accelerate the problem. If they can’t, then prosaic alignment will turn out to be a waste of time, but only in the sense that every lightbulb Edison tested was a waste of time (except for the one that worked). This is what I mean by a shotgun approach.
The point would be to try and create common knowledge that they can. Otherwise, for any “we decided to not do X”, someone else will try doing X, and the problem remains.
Humanity is already taking a shotgun approach to unaligned AGI. Shotgunning safety is viable and important, but I think it’s more urgent to prevent the first shotgun from hitting an artery. Demonstrating AGI viability in this analogy is shotgunning a pig in the town square, to prove to everyone that the guns we are building can in fact kill.
We want safety to have as many pellets in flight as possible. But we want unaligned AGI to have as few pellets in flight as possible. (Preferably none.)