This style of thinking seems illogical to me. It has already clearly resulted in a sort of evaporative cooling in OpenAI. At a high level, is it possible you have the opposite of a wishful thinking bias you claim OpenAI researchers have? I won’t go into too much detail about why this post doesn’t make sense to me. as others already have. But broadly speaking:
I doubt rationality gives you too much of an advantage in capabilities research, and believing this when on a site full of rationalists seems a little pretentious almost.
I also have no idea how any alignment research so far has helped capabilities in any way. I don’t even know how RLHF has helped capabilities. If anything, it’s well documented that RLHF diminishes capabilities (base models can for example play chess very well). The vast majority of alignment research, especially research before LLMs, isn’t even useful to alignment (a lot of it seems far too ungrounded).
There was never a real shot of solving alignment until LLMs became realized either. The world has changed and it seems like foom priors are wrong, but most here haven’t updated. It increasingly seems like we’ll get strong precursor models so we will have ample time to engineer solutions and it won’t be like trying to build a working rocket on the first try. (The reasons being we are rapidly approaching the limits of energy constraints and transistor density without really being close to fooming). This mental model is still popular when reality seems to diverge.
Well I actually have a hunch to why, many holding on to the above priors don’t want to let them go because that means this problem they have dedicated a lot of mental space to will seem more feasible to solve.
If it’s instead a boring engineering problem, this stops being a quest to save the world or an all consuming issue. Incremental alignment work might solve it, so in order to preserve the difficulty of the issue, it will cause extinction for some far-fetched reason. Building precursor models then bootstrapping alignment might solve it, so this “foom” is invented and held on to (for a lot of highly speculative assumptions), because that would stop it from being a boring engineering problem that requires lots of effort and instead something a lone genius will have to solve. The question that maybe energy constraints will limit AI progress from here on out was met with a maybe response, but the number of upvotes make me think most readers just filed it as an unconditional “no, it won’t” in their minds.
There is a good reason to think like this—if boring engineering really does solve the issue, then this community is better off assuming it won’t. In that scenario, boring engineering work is being done by the tech industry anyways, so no need to help there. But I hope if people adopt the mindset of assuming the worst case scenario to have the highest expected effects of research, they realize the assumption they are making is an assumption, and not let the mental effects consume them.
This style of thinking seems illogical to me. It has already clearly resulted in a sort of evaporative cooling in OpenAI.
I don’t think what’s happening at OpenAI is “evaporative cooling as a result of people being too risk-averse to do alignment work that’s adjacent to capabilities”. I would describe it more as “purging anyone who tries to provide oversight”. I don’t think the people who are leaving OpenAI who are safety conscious are doing it because of concerns like the OP, they are doing it because they are being marginalized and the organization is acting somewhat obviously reckless.
Preface: I think this comment will be pretty unpopular here.
I think this is a very unhelpful frame for any discussion (especially so the more high-stakes it is) for the reasons that SlateStarCodex outlines in Against Bravery Debates, and I think your comment would be better with this removed.
I think you make some valid points. In particular, I agree that some people seem to have fallen into a trap of being unrealistically pessimistic about AI outcomes which mirrors the errors of those AI developers and cheerleaders who are being unrealistically optimistic.
On the other hand, I disagree with this critique (although I can see where you’re coming from):
If it’s instead a boring engineering problem, this stops being a quest to save the world or an all consuming issue. Incremental alignment work might solve it, so in order to preserve the difficulty of the issue, it will cause extinction for some far-fetched reason. Building precursor models then bootstrapping alignment might solve it, so this “foom” is invented and held on to (for a lot of highly speculative assumptions), because that would stop it from being a boring engineering problem that requires lots of effort and instead something a lone genius will have to solve.
I think that FOOM is a real risk, and I have a lot of evidence grounding my calculations about available algorithmic efficiency improvements based on estimates of the compute of the human brain. The conclusion I draw from believing that FOOM is both possible, and indeed likely, after a certain threshold of AI R&D capability is reached by AI models is that preventing/controlling FOOM is an engineering problem.
I don’t think we should expect a model in training to become super-human so fast that it blows past our ability to evaluate it. I do think that in order to have the best chance of catching and controlling a rapid accelerating take-off, we need to do pre-emptive engineering work. We need very comprehensive evals to have detailed measures of key factors like general capability, reasoning, deception, self-preservation, and agency. We need carefully designed high-security training facilities with air-gapped datacenters. We need regulation that prevents irresponsible actors from undertaking unsafe experiments. Indeed, most of the critical work to preventing uncontrolled rogue AGI due to FOOM is well described by ‘boring engineering problems’ or ‘boring regulation and enforcement problems’.
Believing in the dangers of recursive self-improvement doesn’t necessarily involve believing that the best solution is a genius theoretical answer to value and intent alignment. I wouldn’t rule the chance of that out, but I certainly don’t expect that slim possibility. It seems foolish to trust in that the primary hope for humanity. Instead, let’s focus on doing the necessary engineering and political work so that we can proceed with reasonable safety measures in place!
This style of thinking seems illogical to me. It has already clearly resulted in a sort of evaporative cooling in OpenAI. At a high level, is it possible you have the opposite of a wishful thinking bias you claim OpenAI researchers have? I won’t go into too much detail about why this post doesn’t make sense to me. as others already have.
But broadly speaking:
I doubt rationality gives you too much of an advantage in capabilities research, and believing this when on a site full of rationalists seems a little pretentious almost.
I also have no idea how any alignment research so far has helped capabilities in any way. I don’t even know how RLHF has helped capabilities. If anything, it’s well documented that RLHF diminishes capabilities (base models can for example play chess very well). The vast majority of alignment research, especially research before LLMs, isn’t even useful to alignment (a lot of it seems far too ungrounded).
There was never a real shot of solving alignment until LLMs became realized either. The world has changed and it seems like foom priors are wrong, but most here haven’t updated. It increasingly seems like we’ll get strong precursor models so we will have ample time to engineer solutions and it won’t be like trying to build a working rocket on the first try. (The reasons being we are rapidly approaching the limits of energy constraints and transistor density without really being close to fooming). This mental model is still popular when reality seems to diverge.
Well I actually have a hunch to why, many holding on to the above priors don’t want to let them go because that means this problem they have dedicated a lot of mental space to will seem more feasible to solve.
If it’s instead a boring engineering problem, this stops being a quest to save the world or an all consuming issue. Incremental alignment work might solve it, so in order to preserve the difficulty of the issue, it will cause extinction for some far-fetched reason. Building precursor models then bootstrapping alignment might solve it, so this “foom” is invented and held on to (for a lot of highly speculative assumptions), because that would stop it from being a boring engineering problem that requires lots of effort and instead something a lone genius will have to solve. The question that maybe energy constraints will limit AI progress from here on out was met with a maybe response, but the number of upvotes make me think most readers just filed it as an unconditional “no, it won’t” in their minds.
There is a good reason to think like this—if boring engineering really does solve the issue, then this community is better off assuming it won’t. In that scenario, boring engineering work is being done by the tech industry anyways, so no need to help there. But I hope if people adopt the mindset of assuming the worst case scenario to have the highest expected effects of research, they realize the assumption they are making is an assumption, and not let the mental effects consume them.
I don’t think what’s happening at OpenAI is “evaporative cooling as a result of people being too risk-averse to do alignment work that’s adjacent to capabilities”. I would describe it more as “purging anyone who tries to provide oversight”. I don’t think the people who are leaving OpenAI who are safety conscious are doing it because of concerns like the OP, they are doing it because they are being marginalized and the organization is acting somewhat obviously reckless.
I think this is a very unhelpful frame for any discussion (especially so the more high-stakes it is) for the reasons that SlateStarCodex outlines in Against Bravery Debates, and I think your comment would be better with this removed.
Added: I appreciate the edit :)
I think you make some valid points. In particular, I agree that some people seem to have fallen into a trap of being unrealistically pessimistic about AI outcomes which mirrors the errors of those AI developers and cheerleaders who are being unrealistically optimistic.
On the other hand, I disagree with this critique (although I can see where you’re coming from):
I think that FOOM is a real risk, and I have a lot of evidence grounding my calculations about available algorithmic efficiency improvements based on estimates of the compute of the human brain. The conclusion I draw from believing that FOOM is both possible, and indeed likely, after a certain threshold of AI R&D capability is reached by AI models is that preventing/controlling FOOM is an engineering problem.
I don’t think we should expect a model in training to become super-human so fast that it blows past our ability to evaluate it. I do think that in order to have the best chance of catching and controlling a rapid accelerating take-off, we need to do pre-emptive engineering work. We need very comprehensive evals to have detailed measures of key factors like general capability, reasoning, deception, self-preservation, and agency. We need carefully designed high-security training facilities with air-gapped datacenters. We need regulation that prevents irresponsible actors from undertaking unsafe experiments. Indeed, most of the critical work to preventing uncontrolled rogue AGI due to FOOM is well described by ‘boring engineering problems’ or ‘boring regulation and enforcement problems’.
Believing in the dangers of recursive self-improvement doesn’t necessarily involve believing that the best solution is a genius theoretical answer to value and intent alignment. I wouldn’t rule the chance of that out, but I certainly don’t expect that slim possibility. It seems foolish to trust in that the primary hope for humanity. Instead, let’s focus on doing the necessary engineering and political work so that we can proceed with reasonable safety measures in place!