I don’t think the concept of infohazard as applied to AI alignment/safety has anything to do with the Great Man Theory. If we bought the Great Man Theory, we would also have to believe that at any time a random genius could develop ASI using only their laptop and unleash it onto the world, in which case, any hope of control is moot. Most people who support AI governance don’t believe things are quite that extreme, and think that strategies ranging from “controlling compute” to “making it socially disreputable to work on AI capabilities” may effectively delay the development of AGI by significantly hurting the big collaborative projects.
Yeah, this is definitely something that’s more MIRI specific, though I’d make the case that the infohazard concept as used by the LW community kinda does invite the Great Man Theory of science and technology because infohazards tend to connote the idea that there are ridiculously impactful technologies that can be found by small groups. But yeah, I do buy that this is a more minor problem, compared to the other problems with infohazards.
We live in a world in which economic incentives have aligned things so that the power of numbers lies on the side of capabilities.
My fundamental crux here is that this is ultimately going to result in more high-quality alignment work being done than LW will do, and the incentives for capabilities also result in incentives for AI safety and control, and a lot of this fundamentally comes down to companies internalizing the costs of AI not being controlled far more than is usual, plus there are direct incentives to control AI, because an uncontrollable AI is not nearly as useful to companies as LW thinks, which also implies that the profit incentives will go to solving AI control problems.
because an uncontrollable AI is not nearly as useful to companies as LW thinks
No one thinks it is. An uncontrollably warming up world isn’t very useful to fossil fuel companies either, but fossil fuel companies can’t stop to care about that, because it’s too long term and they have to optimize profits on a much shorter time horizon. The argument isn’t “evil companies profit from unleashing unaligned AI”, it’s “dumb badly coordinated companies unleash unaligned AI while trying to build aligned AI while also cutting costs and racing with each other”. Cheap, Fast, and Doesn’t Kill Everyone: choose only two.
My fundamental crux here is that this is ultimately going to result in more high-quality alignment work being done than LW will do, and the incentives for capabilities also result in incentives for AI safety and control
I don’t know for sure if the infohazard concept is that useful. I think it could be only given certain assumptions. If you discovered a concept that advances alignment significantly and can’t be used much for capabilities you should definitely scream it to everyone listening, and to many who aren’t. But “this would lead to better alignment research” isn’t very useful if it leads to proportionally even stronger capabilities. The goal here isn’t just maximizing alignment knowledge, it’s closing a gap. Relative speed matters, not just absolute. That said, you may be right—but we’re already discussing something far different from “Great Man Theory”. This is just a highly peculiar situation. It would be extremely convenient if some genius appeared who can solve alignment overnight on their own just starting from existing knowledge. It’s not very likely, if the history of science is anything to go by, because alignment is probably harder than, say, the theory of relativity. But however unlikely it is, it might be a better hope than relying on collaborative projects that doom alignment to just keep lagging behind even if it in general advances faster.
because infohazards tend to connote the idea that there are ridiculously impactful technologies that can be found by small groups
I don’t quite follow. Infohazards mean “some information is dangerous”. This doesn’t require small groups, in fact “it’s safer if this information is only held by a small group rather than spread to the world at large” is inherently more true if you assume that Great Men Theory is false, because regardless of trust in the small group, the small group will just be less able to turn the information into dangerous technology than the collective intellect of humanity at large would be.
Cheap, Fast, and Doesn’t Kill Everyone: choose only two.
That is an extremely easy choice. “Doesn’t Kill Everyone” is blatantly essential. “Fast” is unfortunately a requirement given that the open-source community intent on releasing everything to every nutjob, criminal, and failed state on the planet is only ~18 month behind you, so waiting until this can be done cheaply means that there will by many thousands of groups that can do this and we’re all dead if any one of them does something stupid (a statistical inevitability). So the blindingly obvious decision is to skip “Cheap” and hit up the tech titans for tens of billions of dollars, followed by Wall Street for hundreds of billions of dollars as training costs increase. While also clamming up on publishing capabilities work and only publishing alignment work, and throwing grant money around to fund external Alignment Research. Which sounds to me like an description of the externally visible strategies of the ~3 labs making frontier models at the moment.
I honestly think this is still cheap. Non-cheap would be monumentally bigger and with much larger teams employed on alignment to attack it from all angles. I think we’re seeing Cheap and Fast, with the obvious implied problem.
You’re talking about a couple of thousand extremely smart people, quite a few of them of them alignment researchers (some of whom post regularly on Less Wrong/The Alignment Forum), and suggesting they’re all not noticing the possibility of the extinction of the human race. The desirability of not killing everyone is completely obvious, to anyone aware that it’s a possibility. Absolutely no one wants to kill themselves and all their friends and family. (This is obviously not a problem that private bunker will help with: a paperclip maximizer will want to turn that into paperclips too. I believe Elon Musk is on record pointing out the fact that Mars is not far enough to run.) Yes, there are people like Yann LeCun who are publically making it clear that they’re still missing the point that this could happen any time soon. On the other hand, Sam Altman, Ilya Suskever, Dario & Daniella Amodei, and Demis Hassabis are all on public record with significant personal reputational skin in the game saying that killing everyone is a real risk in the relatively near term, and also that not doing so is obviously vital, while Sundar Pichai is constitutionally incapable of speaking in public without using the words ‘helpful’, ‘responsible’ and ‘trustworthy’ at least once every few paragraphs, so it’s hard to tell how worried he is. OpenAI routinely delay shipping their models for ~6 months while they and other external groups do safety work, Google just delayed Gemini Ultra for what sounds rather like safety reasons, and Anthropic are publically committed to never ship first, and never have. This is not what “cheap”+”fast” looks like.
Tens to hundreds of billions of dollars is not cheap in anyone’s books, not even tech titans’. Add Google and Microsoft’s entire current market capitalizations together, you get 4-or-5 trillion. The only place we could get significantly more money than that to throw at the problem is the US government. Now, it is entirely true that the proportion of that largesse going to alignment research isn’t anything like as high as the proportion going to build training compute (though OpenAI did publicly commit 20% of their training compute to AGI-level alignment work, and that’s a lot of money), But if they threw a couple of orders of magnitude more than the $10m in grants that OpenAI just threw at alignment, are there enough competent alignment researchers to spend it without seriously diminishing returns? I think alignment field-building is the bottleneck.
Just because this isn’t cheap relative to the world GDP doesn’t mean it’s enough. If our goal was “build a Dyson sphere” even throwing our whole productivity towards it would be cheap. I’m not saying there aren’t any concerns, but the money is still mostly going to capabilites and safety, while a concern, still needs to be compromised also with commercial needs and race dynamics—albeit mercifully dampened. Honestly with LeCun’s position we’re just lucky that Meta isn’t that good at AI, or they alone would set the pace of the race for everyone else.
I think Meta have been somewhat persuaded by the Biden administration to sign on for safety, or at least for safety-theatre, despite LeCun. They actually did a non-trivial amount of real safety work on Llama-2 (a model small enough not to need it), and then never released one size of it for safety reasons.. Which was of course pointless, or more exactly just showing off, since they then open-sourced the weights, including to the base models, so anyone with $200 can fine-tune their safety work out again. However, it’s all basically window dressing, as these models are (we believe) too small to be an x-risk, and they were reasonably certain of that before they started (as far as we know, about the worst these models can do is write badly-written underage porn or phishing emails, or similarly marginally assist criminals.)
Obviously no modern models are an existential risk, the problem is the trajectory. Does the current way of handling the situation extrapolate properly to even just AGI, something that is an open goal for many of these companies? I’d say not, or at least, I very much doubt it. As in, if you’re not doing that kind of work inside a triple-airgapped and firewalled desert island and planning for layers upon layers of safety testing before even considering releasing the resulting product as a commercial tool, you’re doing it wrong—and that’s just for technical safety. I still haven’t seen a serious proposal of how do you make human labor entirely unnecessary and maintain a semblance of economic order instead of collapsing every social and political structure at once.
Yeah, this is definitely something that’s more MIRI specific, though I’d make the case that the infohazard concept as used by the LW community kinda does invite the Great Man Theory of science and technology because infohazards tend to connote the idea that there are ridiculously impactful technologies that can be found by small groups. But yeah, I do buy that this is a more minor problem, compared to the other problems with infohazards.
My fundamental crux here is that this is ultimately going to result in more high-quality alignment work being done than LW will do, and the incentives for capabilities also result in incentives for AI safety and control, and a lot of this fundamentally comes down to companies internalizing the costs of AI not being controlled far more than is usual, plus there are direct incentives to control AI, because an uncontrollable AI is not nearly as useful to companies as LW thinks, which also implies that the profit incentives will go to solving AI control problems.
No one thinks it is. An uncontrollably warming up world isn’t very useful to fossil fuel companies either, but fossil fuel companies can’t stop to care about that, because it’s too long term and they have to optimize profits on a much shorter time horizon. The argument isn’t “evil companies profit from unleashing unaligned AI”, it’s “dumb badly coordinated companies unleash unaligned AI while trying to build aligned AI while also cutting costs and racing with each other”. Cheap, Fast, and Doesn’t Kill Everyone: choose only two.
I don’t know for sure if the infohazard concept is that useful. I think it could be only given certain assumptions. If you discovered a concept that advances alignment significantly and can’t be used much for capabilities you should definitely scream it to everyone listening, and to many who aren’t. But “this would lead to better alignment research” isn’t very useful if it leads to proportionally even stronger capabilities. The goal here isn’t just maximizing alignment knowledge, it’s closing a gap. Relative speed matters, not just absolute. That said, you may be right—but we’re already discussing something far different from “Great Man Theory”. This is just a highly peculiar situation. It would be extremely convenient if some genius appeared who can solve alignment overnight on their own just starting from existing knowledge. It’s not very likely, if the history of science is anything to go by, because alignment is probably harder than, say, the theory of relativity. But however unlikely it is, it might be a better hope than relying on collaborative projects that doom alignment to just keep lagging behind even if it in general advances faster.
I don’t quite follow. Infohazards mean “some information is dangerous”. This doesn’t require small groups, in fact “it’s safer if this information is only held by a small group rather than spread to the world at large” is inherently more true if you assume that Great Men Theory is false, because regardless of trust in the small group, the small group will just be less able to turn the information into dangerous technology than the collective intellect of humanity at large would be.
That is an extremely easy choice. “Doesn’t Kill Everyone” is blatantly essential. “Fast” is unfortunately a requirement given that the open-source community intent on releasing everything to every nutjob, criminal, and failed state on the planet is only ~18 month behind you, so waiting until this can be done cheaply means that there will by many thousands of groups that can do this and we’re all dead if any one of them does something stupid (a statistical inevitability). So the blindingly obvious decision is to skip “Cheap” and hit up the tech titans for tens of billions of dollars, followed by Wall Street for hundreds of billions of dollars as training costs increase. While also clamming up on publishing capabilities work and only publishing alignment work, and throwing grant money around to fund external Alignment Research. Which sounds to me like an description of the externally visible strategies of the ~3 labs making frontier models at the moment.
I honestly think this is still cheap. Non-cheap would be monumentally bigger and with much larger teams employed on alignment to attack it from all angles. I think we’re seeing Cheap and Fast, with the obvious implied problem.
You’re talking about a couple of thousand extremely smart people, quite a few of them of them alignment researchers (some of whom post regularly on Less Wrong/The Alignment Forum), and suggesting they’re all not noticing the possibility of the extinction of the human race. The desirability of not killing everyone is completely obvious, to anyone aware that it’s a possibility. Absolutely no one wants to kill themselves and all their friends and family. (This is obviously not a problem that private bunker will help with: a paperclip maximizer will want to turn that into paperclips too. I believe Elon Musk is on record pointing out the fact that Mars is not far enough to run.) Yes, there are people like Yann LeCun who are publically making it clear that they’re still missing the point that this could happen any time soon. On the other hand, Sam Altman, Ilya Suskever, Dario & Daniella Amodei, and Demis Hassabis are all on public record with significant personal reputational skin in the game saying that killing everyone is a real risk in the relatively near term, and also that not doing so is obviously vital, while Sundar Pichai is constitutionally incapable of speaking in public without using the words ‘helpful’, ‘responsible’ and ‘trustworthy’ at least once every few paragraphs, so it’s hard to tell how worried he is. OpenAI routinely delay shipping their models for ~6 months while they and other external groups do safety work, Google just delayed Gemini Ultra for what sounds rather like safety reasons, and Anthropic are publically committed to never ship first, and never have. This is not what “cheap”+”fast” looks like.
Tens to hundreds of billions of dollars is not cheap in anyone’s books, not even tech titans’. Add Google and Microsoft’s entire current market capitalizations together, you get 4-or-5 trillion. The only place we could get significantly more money than that to throw at the problem is the US government. Now, it is entirely true that the proportion of that largesse going to alignment research isn’t anything like as high as the proportion going to build training compute (though OpenAI did publicly commit 20% of their training compute to AGI-level alignment work, and that’s a lot of money), But if they threw a couple of orders of magnitude more than the $10m in grants that OpenAI just threw at alignment, are there enough competent alignment researchers to spend it without seriously diminishing returns? I think alignment field-building is the bottleneck.
Just because this isn’t cheap relative to the world GDP doesn’t mean it’s enough. If our goal was “build a Dyson sphere” even throwing our whole productivity towards it would be cheap. I’m not saying there aren’t any concerns, but the money is still mostly going to capabilites and safety, while a concern, still needs to be compromised also with commercial needs and race dynamics—albeit mercifully dampened. Honestly with LeCun’s position we’re just lucky that Meta isn’t that good at AI, or they alone would set the pace of the race for everyone else.
I think Meta have been somewhat persuaded by the Biden administration to sign on for safety, or at least for safety-theatre, despite LeCun. They actually did a non-trivial amount of real safety work on Llama-2 (a model small enough not to need it), and then never released one size of it for safety reasons.. Which was of course pointless, or more exactly just showing off, since they then open-sourced the weights, including to the base models, so anyone with $200 can fine-tune their safety work out again. However, it’s all basically window dressing, as these models are (we believe) too small to be an x-risk, and they were reasonably certain of that before they started (as far as we know, about the worst these models can do is write badly-written underage porn or phishing emails, or similarly marginally assist criminals.)
Obviously no modern models are an existential risk, the problem is the trajectory. Does the current way of handling the situation extrapolate properly to even just AGI, something that is an open goal for many of these companies? I’d say not, or at least, I very much doubt it. As in, if you’re not doing that kind of work inside a triple-airgapped and firewalled desert island and planning for layers upon layers of safety testing before even considering releasing the resulting product as a commercial tool, you’re doing it wrong—and that’s just for technical safety. I still haven’t seen a serious proposal of how do you make human labor entirely unnecessary and maintain a semblance of economic order instead of collapsing every social and political structure at once.