Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety.
There’s just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies.
(No, “you need huge profits to solve alignment” isn’t a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)
Unambiguously evil seems unnecessarily strong. Something like “almost certainly misguided” might be more appropriate? (still strong, but arguably defensible)
It’s generally also very questionable that they started creating models for research, then seamlessly pivoted to commercial exploitation without changing any of their practices. A prototype meant as proof of concept isn’t the same as a safe finished product you can sell. Honestly, only in software and ML we get people doing such shoddy engineering.
(No, “you need huge profits to solve alignment” isn’t a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)
This seems insufficiently argued; the existence of any alignment research that can be done without huge profits is not enough to establish that you don’t need huge profits to solve alignment (particularly when considering things like how long timelines are even absent your intervention).
To be clear, I agree that OpenAI are doing evil by creating AI hype.
This is too strong. For example, releasing the product would be correct if someone else would do something similar soon anyway and you’re safer than them and releasing first lets you capture more of the free energy. (That’s not the case here, but it’s not as straightforward as you suggest, especially with your “Regardless of how good their alignment plans are” and your claim “There’s just no good reason to do that, except short-term greed”.)
This doesn’t even address their stated reason/excuse for pushing straight for AGI.
I don’t have a link handy, but Altman has said that short timelines and a slow takeoff is a good scenario for AI safety. Pushing for AGI now raises the odds that, when we get it near it, it won’t get 100x smarter or more prolific rapidly. And I think that’s right, as far as it goes. It needs to be weighed against the argument for more alignment research before approaching AGI, but doing that weighing is not trivial. I don’t think there’s a clear winner.
Now, Altman pursuing more compute with his “7T investment” push really undercuts that argument being his sincere opinion, at least now (he said bit about that a while ago, maybe 5 years?).
But even if Altman was or is lying, that doesn’t make that thesis wrong. This might be the safest route to AGI. I haven’t seen anyone even try in good faith to weigh the complexities of the two arguments against each other.
Now, you can still say that this is evil, because the obviously better path is to do decades and generations of alignment work prior to getting anywhere near AGI. But that’s simply not going to happen.
One reason that goes overlooked is that most human beings are not utilitarians. Even if they realize we’re lowering the odds of future humans having an amazing, abundant future, they are pursuing AGI right now because it might prevent tham and many of those they love from dying painfully. This is terribly selfish from a utilitarian perspective, but reason does not cross the is/ought gap to make utilitarianism any more rational than selfishness. I think calling selfishness “evil” is ultimately correct, but it’s not obvious. And by that standard, most of humanity is currently evil.
And in this case, evil intentions still might have good outcomes. While OpenAI has no good alignment plan, neither does anyone else. Humanity is simply not going to pause all AI work to study alignment for generations, so plans that include substantial slowdown are not good plans. So fast timelines with a slow takeoff based on lack of compute might still be the best chance we’ve got. Again, I don’t know and I don’t think anyone else does, either.
“One reason that goes overlooked is that most human beings are not utilitarians”
I think this point is just straightforwardly wrong. Even from a purely selfish perspective, it’s reasonable to want to stop AI.
The main reason humanity is not going to stop seems mainly like coordination problems, or something close to learned helplessness in these kind of competitive dynamics.
I’m not sure that’s true. It’s true if you adopt the dominant local perspective “alignment is very hard and we need more time to do it”. But there are other perspectives: see “AI is easy to control” by Pope & Belrose, arguing that the success of RLHF means there’s a less than 1% risk of extinction from AI. I think this perspective is both subtly wrong and deeply confused in mistaking alignment with total x-risk, but the core argument isn’t obviously wrong. So reasonable people can and do argue for full speed ahead on AGI.
I agree with pretty much all of the counterarguments made by Steve Byrnes in his Thoughts on “AI is easy to control” by Pope & Belrose. But not all reasonable people will. And those who are also non-utilitarians (most of humanity) will be pursuing AGI ASAP for rational (if ultimately subtly wrong) reasons.
I think we need to understand and take this position seriously to do a good job of avoiding extinction as best we can.
Basically, I think whether or not one thinks whether alignment is hard or not is much more of the crux than whether or not they’re utilitarian.
Pesonally, I don’t find Pope & Belrose very convincing, although I do commend them for the reasonable effort—but if I did believe that AI is likely to go well, I’d probably also be all for it. I just don’t see how this is related to utilitarianism (maybe for all but a very small subset of people in EA).
IMO the proportion of effort into AI alignment research scales with total AI investment. Lots of AI labs themselves do alignment research and open source/release research on the matter.
OpenAI at least ostensibly has a mission. If OpenAI didn’t make the moves they did, Google would have their spot, and Google is closer to the “evil self-serving corporation” archetype than OpenAI
Can we quantify the value of theoretical alignment research before and after ChatGPT?
For example, mech interp research seems much more practical now. If alignment proves to be more of an engineering problem than a theoretical one, then I don’t see how you can meaningfully make progress without precursor models.
Furthermore, given how nearly everyone with a lot of GPUs is getting similar results to OAI, where similar means within 1 OOM, it’s likely that in the future someone would have stumbled upon AGI with the compute of the 2030s.
Let’s say their secret sauce gives them the equivalent of 1 extra hardware generation (even this is pretty generous). That’s only ~2-3 years. Meta built a $10B data center to match TikTok’s content algorithm. This datacenter meant to decide which videos to show to users happened to catch up to GPT-4!
I suspect the “ease” of making GPT-3/4 informed OAI’s choice to publicize their results.
I wonder if you’re getting disagreement strictly over that last line. I think that all makes sense, but I strongly suspect that the ease of making ChatGPT had nothing to do with their decision to publicize and commercialize.
There’s little reason to think that alignment is an engineering problem to the exclusion of theory. But making good theory is also partly dependent on knowing about the system you’re addressing, so I think there’s a strong argument that that progress accelerated alignment work as strongly as capabilities.
I think the argument is that it would be way better to do all the work we could on alignment before advancing capabilities at all. Which it would be. If we were not only a wise species, but a universally utilitarian one (see my top level response on that if you care). Which we are decidedly not.
Regardless of how good their alignment plans are, the thing that makes OpenAI unambiguously evil is that they created a strongly marketed public product and, as a result, caused a lot public excitement about AI, and thus lots of other AI capabilities organizations were created that are completely dismissive of safety.
There’s just no good reason to do that, except short-term greed at the cost of higher probability that everyone (including people at OpenAI) dies.
(No, “you need huge profits to solve alignment” isn’t a good excuse — we had nowhere near exhausted the alignment research that can be done without huge profits.)
Unambiguously evil seems unnecessarily strong. Something like “almost certainly misguided” might be more appropriate? (still strong, but arguably defensible)
Taboo “evil” (locally, in contexts like this one)?
Here the thing that I’m calling evil is pursuing short-term profits at the cost of non-negligeably higher risk that everyone dies.
It’s generally also very questionable that they started creating models for research, then seamlessly pivoted to commercial exploitation without changing any of their practices. A prototype meant as proof of concept isn’t the same as a safe finished product you can sell. Honestly, only in software and ML we get people doing such shoddy engineering.
This seems insufficiently argued; the existence of any alignment research that can be done without huge profits is not enough to establish that you don’t need huge profits to solve alignment (particularly when considering things like how long timelines are even absent your intervention).
To be clear, I agree that OpenAI are doing evil by creating AI hype.
This is too strong. For example, releasing the product would be correct if someone else would do something similar soon anyway and you’re safer than them and releasing first lets you capture more of the free energy. (That’s not the case here, but it’s not as straightforward as you suggest, especially with your “Regardless of how good their alignment plans are” and your claim “There’s just no good reason to do that, except short-term greed”.)
OpenAI is not evil. They are just defecting on an epistemic prisoner’s dilemma.
This doesn’t even address their stated reason/excuse for pushing straight for AGI.
I don’t have a link handy, but Altman has said that short timelines and a slow takeoff is a good scenario for AI safety. Pushing for AGI now raises the odds that, when we get it near it, it won’t get 100x smarter or more prolific rapidly. And I think that’s right, as far as it goes. It needs to be weighed against the argument for more alignment research before approaching AGI, but doing that weighing is not trivial. I don’t think there’s a clear winner.
Now, Altman pursuing more compute with his “7T investment” push really undercuts that argument being his sincere opinion, at least now (he said bit about that a while ago, maybe 5 years?).
But even if Altman was or is lying, that doesn’t make that thesis wrong. This might be the safest route to AGI. I haven’t seen anyone even try in good faith to weigh the complexities of the two arguments against each other.
Now, you can still say that this is evil, because the obviously better path is to do decades and generations of alignment work prior to getting anywhere near AGI. But that’s simply not going to happen.
One reason that goes overlooked is that most human beings are not utilitarians. Even if they realize we’re lowering the odds of future humans having an amazing, abundant future, they are pursuing AGI right now because it might prevent tham and many of those they love from dying painfully. This is terribly selfish from a utilitarian perspective, but reason does not cross the is/ought gap to make utilitarianism any more rational than selfishness. I think calling selfishness “evil” is ultimately correct, but it’s not obvious. And by that standard, most of humanity is currently evil.
And in this case, evil intentions still might have good outcomes. While OpenAI has no good alignment plan, neither does anyone else. Humanity is simply not going to pause all AI work to study alignment for generations, so plans that include substantial slowdown are not good plans. So fast timelines with a slow takeoff based on lack of compute might still be the best chance we’ve got. Again, I don’t know and I don’t think anyone else does, either.
“One reason that goes overlooked is that most human beings are not utilitarians” I think this point is just straightforwardly wrong. Even from a purely selfish perspective, it’s reasonable to want to stop AI.
The main reason humanity is not going to stop seems mainly like coordination problems, or something close to learned helplessness in these kind of competitive dynamics.
I’m not sure that’s true. It’s true if you adopt the dominant local perspective “alignment is very hard and we need more time to do it”. But there are other perspectives: see “AI is easy to control” by Pope & Belrose, arguing that the success of RLHF means there’s a less than 1% risk of extinction from AI. I think this perspective is both subtly wrong and deeply confused in mistaking alignment with total x-risk, but the core argument isn’t obviously wrong. So reasonable people can and do argue for full speed ahead on AGI.
I agree with pretty much all of the counterarguments made by Steve Byrnes in his Thoughts on “AI is easy to control” by Pope & Belrose. But not all reasonable people will. And those who are also non-utilitarians (most of humanity) will be pursuing AGI ASAP for rational (if ultimately subtly wrong) reasons.
I think we need to understand and take this position seriously to do a good job of avoiding extinction as best we can.
Basically, I think whether or not one thinks whether alignment is hard or not is much more of the crux than whether or not they’re utilitarian.
Pesonally, I don’t find Pope & Belrose very convincing, although I do commend them for the reasonable effort—but if I did believe that AI is likely to go well, I’d probably also be all for it. I just don’t see how this is related to utilitarianism (maybe for all but a very small subset of people in EA).
IMO the proportion of effort into AI alignment research scales with total AI investment. Lots of AI labs themselves do alignment research and open source/release research on the matter.
OpenAI at least ostensibly has a mission. If OpenAI didn’t make the moves they did, Google would have their spot, and Google is closer to the “evil self-serving corporation” archetype than OpenAI
Can we quantify the value of theoretical alignment research before and after ChatGPT?
For example, mech interp research seems much more practical now. If alignment proves to be more of an engineering problem than a theoretical one, then I don’t see how you can meaningfully make progress without precursor models.
Furthermore, given how nearly everyone with a lot of GPUs is getting similar results to OAI, where similar means within 1 OOM, it’s likely that in the future someone would have stumbled upon AGI with the compute of the 2030s.
Let’s say their secret sauce gives them the equivalent of 1 extra hardware generation (even this is pretty generous). That’s only ~2-3 years. Meta built a $10B data center to match TikTok’s content algorithm. This datacenter meant to decide which videos to show to users happened to catch up to GPT-4!
I suspect the “ease” of making GPT-3/4 informed OAI’s choice to publicize their results.
I wonder if you’re getting disagreement strictly over that last line. I think that all makes sense, but I strongly suspect that the ease of making ChatGPT had nothing to do with their decision to publicize and commercialize.
There’s little reason to think that alignment is an engineering problem to the exclusion of theory. But making good theory is also partly dependent on knowing about the system you’re addressing, so I think there’s a strong argument that that progress accelerated alignment work as strongly as capabilities.
I think the argument is that it would be way better to do all the work we could on alignment before advancing capabilities at all. Which it would be. If we were not only a wise species, but a universally utilitarian one (see my top level response on that if you care). Which we are decidedly not.