Progress often follows an s-curve, which appears exponential until the current research direction is exploited and tapers off. Moving an exponential up, even a little, early on can have large downstream consequences:
Your graph shows “a small increase” that represents progress that is equal to an advance of a third to a half the time left until catastrophe on the default trajectory. That’s not small! That’s as much progress as everyone else combined achieves in a third of the time till catastrophic models! It feels like you’d have to figure out some newer efficient training that allows you to get GPT-3 levels of performance with GPT-2 levels of compute to have an effect that was plausibly that large.
In general I wish you would actually write down equations for your model and plug in actual numbers; I think it would be way more obvious that things like this are not actually reasonable models of what’s going on.
But there is no clear distinction between eliminating capability overhangs and discovering new capabilities.
Yes, that’s exactly what Paul is saying? That one good thing about discovering new capabilities is that it eliminates capability overhangs? Why is this a rebuttal of his point?
For example, it took a few years for chain-of-thought prompting to become more widely known than among a small circle of people around AI Dungeon. Once chain-of-thought became publicly known, labs started fine-tuning models to explicitly do chain-of-thought, increasing their capabilities significantly. This gap between niche discovery and public knowledge drastically slowed down progress along the growth curve!
Why do you believe this “drastically” slowed down progress?
a lot more optimization pressure is being put into optimizing the arguments in favor of advancing capabilities than in the arguments for caution and advancing alignment, and so we should expect the epistemological terrain to be fraught in this case.
I think this is pretty false. There’s no equivalent to Let’s think about slowing down AI, or a tag like Restrain AI Development (both of which are advocating an even stronger claim than just “caution”) -- there’s a few paragraphs in Paul’s post, one short comment by me, and one short post by Kaj. I’d say that hardly any optimization has gone into arguments to AI safety researchers for advancing capabilities.
(Just in this post you introduce a new argument about how small increases can make big differences to exponentials, whereas afaict there is basically just one argument for advancing capabilities that people put forward, i.e. your “Dangerous Argument 2”. Note that “Dangerous Argument 1″ is not an argument for advancing capabilities, it is an argument that the negative effect is small in magnitude. You couldn’t reasonably apply that argument to advocate for pure capabilities work.)
(I agree in the wider world there’s a lot more optimization for arguments in favor of capabilities progress that people in general would find compelling, but I don’t think that matters for “what should alignment researchers think”.)
(EDIT: I thought of a couple of additional arguments that appeal to alignment researchers that people make for advancing capabilities, namely “maybe geopolitics gets worse in the future so we should have AGI sooner” and “maybe AGI can help us deal with all the other problems of the world, including other x-risks, so we should get it sooner”. I still think a lot more optimization pressure has gone into the arguments for caution.)
Advancing AI capabilities is heavily incentivized, especially when compared to alignment.
I’m not questioning your experiences at Conjecture—you’d know best what incentives are present there—but this has not been my experience at CHAI or DeepMind Alignment (the two places I’ve worked at).
To me, it seems like the claim that is (implicitly) being made here is that small improvements early on compound to have much bigger impacts later on, and also a larger shortening of the overall timeline to some threshold. (To be clear, I don’t think the exponential model presented provides evidence for this latter claim)
I think the first claim is obviously true. The second claim could be true in practice, though I feel quite uncertain about this. It happens to be false in the specific model of moving an exponential up (if you instantaneously double the progress at some point in time, the deadline moves one doubling-time closer, but the total amount of capabilities at every future point in time doubles). It might hold under a hyperbolic model or something; I think it would be interesting to nail down a quantitative model here.
Why is this a rebuttal of his point?
As a trivial example, consider a hypothetical world (I don’t think we’re in literally this world, this is just for illustration) where an overhang is the only thing keeping us from AGI. Then in this world, closing the overhang faster seems obviously bad.
More generally, the only time when closing an overhang is obviously good is when (a) there is an extremely high chance that it will be closed sooner or later, and (b) that counterfactual impact on future capabilities beyond the point where the overhang is closed is fully screened off (i.e nobody is spending more earlier because of impressive capabilities demos, people having longer to play with these better capabilities doesn’t lead to additional capabilities insights, and the overhang isn’t the only thing keeping us from AGI). The hypothetical trivial example at this opposite extreme is if you knew with absolute certainty that AI timelines were exactly X years, and you had a button that would cause X/2 years of capabilities research to happen instantaneously, then an X/2 year hiatus during which no capabilities happens at all, then the resumption of capabilities research on exactly the same track as before, with no change in capabilities investment/insights relative to the counterfactual.
If you can’t be certain about these conditions (in particular, it seems the OP is claiming mostly that (a) is very hard to be confident about), then it seems like the prudent decision is not to close the overhang.
To me, it seems like the claim that is (implicitly) being made here is that small improvements early on compound to have much bigger impacts later on, and also a larger shortening of the overall timeline to some threshold.
As you note, the second claim is false for the model the OP mentions. I don’t care about the first claim once you know whether the second claim is true or false, which is the important part.
I agree it could be true in practice in other models but I am unhappy about the pattern where someone makes a claim based on arguments that are clearly wrong, and then you treat the claim as something worth thinking about anyway. (To be fair maybe you already believed the claim or were interested in it rather than reacting to it being present in this post, but I still wish you’d say something like “this post is zero evidence for the claim, people should not update at all on it, separately I think it might be true”.)
As a trivial example, consider a hypothetical world (I don’t think we’re in literally this world, this is just for illustration) where an overhang is the only thing keeping us from AGI. Then in this world, closing the overhang faster seems obviously bad.
To my knowledge, nobody in this debate thinks that advancing capabilities is uniformly good. Yes, obviously there is an effect of “less time for alignment research” which I think is bad all else equal. The point is just that there is also a positive impact of “lessens overhangs”.
If you can’t be certain about these conditions (in particular, it seems the OP is claiming mostly that (a) is very hard to be confident about), then it seems like the prudent decision is not to close the overhang.
I find the principle “don’t do X if it has any negative effects, no matter how many positive effects it has” extremely weird but I agree if you endorse that that means you should never work on things that advance capabilities. But if you endorse that principle, why did you join OpenAI?
“This [model] is zero evidence for the claim” is a roughly accurate view of my opinion. I think you’re right that epistemically it would have been much better for me to have said something along those lines. Will edit something into my original comment.
Exponentials are memoryless. If you advance an exponential to where it would be one year from now. then some future milestone (like “level of capability required for doom”) appears exactly one year earlier. [...]
Errr, I feel like we already agree on this point? Like I’m saying almost exactly the same thing you’re saying; sorry if I didn’t make it prominent enough:
It happens to be false in the specific model of moving an exponential up (if you instantaneously double the progress at some point in time, the deadline moves one doubling-time closer, but the total amount of capabilities at every future point in time doubles).
I’m also not claiming this is an accurate model; I think I have quite a bit of uncertainty as to what model makes the most sense.
don’t do X if it has any negative effects, no matter how many positive effects it has
I was not intending to make a claim of this strength, so I’ll walk back what I said. What I meant to say was “I think most of the time the benefit of closing overhangs is much smaller than the cost of reduced timelines, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”. I think I was thinking too much inside my inside view when writing the comment, and baking in a few other assumptions from my model (including: closing overhangs benefiting capabilities at least as much, research being kinda inefficient (though not as inefficient as OP thinks probably)). I think on an outside view I would endorse a weaker but directionally same version of my claim.
In my work I do try to avoid advancing capabilities where possible, though I think I can always do better at this.
Yes, sorry, I realized that right after I posted and replaced it with a better response, but apparently you already saw it :(
What I meant to say was “I think most of the time closing overhangs is more negative than positive, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”.
But like, why? I wish people would argue for this instead of flatly asserting it and then talking about increased scrutiny or burdens of proof (which I also don’t like).
I think maybe the crux is the part about the strength of the incentives towards doing capabilities. From my perspective it generally seems like this incentive gradient is pretty real: getting funded for capabilities is a lot easier, it’s a lot more prestigious and high status in the mainstream, etc. I also myself viscerally feel the pull of wishful thinking (I really want to be wrong about high P(doom)!) and spend a lot of willpower trying to combat it (but also not so much that I fail to update where things genuinely are not as bad as I would expect, but also not allowing that to be an excuse for wishful thinking, etc...).
In that case, I think you should try and find out what the incentive gradient is like for other people before prescribing the actions that they should take. I’d predict that for a lot of alignment researchers your list of incentives mostly doesn’t resonate, relative to things like:
Active discomfort at potentially contributing to a problem that could end humanity
Social pressure + status incentives from EAs / rationalists to work on safety and not capabilities
Desire to work on philosophical or mathematical puzzles, rather than mucking around in the weeds of ML engineering
Wanting to do something big-picture / impactful / meaningful (tbc this could apply to both alignment and capabilities)
For reference, I’d list (2) and (4) as the main things that affects me, with maybe a little bit of (3), and I used to also be pretty affected by (1). None of the things you listed feel like they affect me much (now or in the past), except perhaps wishful thinking (though I don’t really see that as an “incentive”).
Your graph shows “a small increase” that represents progress that is equal to an advance of a third to a half the time left until catastrophe on the default trajectory. That’s not small! That’s as much progress as everyone else combined achieves in a third of the time till catastrophic models! It feels like you’d have to figure out some newer efficient training that allows you to get GPT-3 levels of performance with GPT-2 levels of compute to have an effect that was plausibly that large.
In general I wish you would actually write down equations for your model and plug in actual numbers; I think it would be way more obvious that things like this are not actually reasonable models of what’s going on.
I’m not sure I get your point here: the point of the graph is to just illustrate that when effects compound, looking only at the short term difference is misleading. Short term differences lead to much larger long term effects due to compounding. The graph was just a quick digital rendition of what I previously drew on a whiteboard to illustrate the concept, and is meant to be an intuition pump.
The model is not implying any more sophisticated complex mathematical insight than just “remember that compounding effects exist”, and the graph is just for illustrative purposes.
Of course, if you had perfect information at the bottom of the curve, you would see that the effect your “small” intervention is having is actually quite big: but that’s precisely the point of the post, it’s very hard to see this normally! We don’t have perfect information, and the point aims at raising salience to people’s minds that what they perceive as a “small” action in the present moment, will likely lead to a “big” impact later on.
To illustrate the point: if you make a discovery now worth 2 billion dollars more of investment in AI capabilities, and this compounds yearly at a 20% rate, you’ll get far more than +2 billion in the final total e.g., 10 years later. If you make this 2 billion dollar discovery later, after ten years you will not have as much money invested in capabilities as you would have in the other case!
Such effects might be obvious in retrospect with perfect information, but this is indeed the point of the post: when evaluating actions in our present moment it’s quite hard to foresee these things, and the post aims to raise these effects to salience!
We could spend time on more graphs, equations and numbers, but that wouldn’t be a great marginal use of our time. Feel free to spend more time on this if you find it worthwhile (it’s a pretty hard task, since no one has a sufficiently gears-level model of progress!).
I continue to think that if your model is that capabilities follow an exponential (i.e. dC/dt = kC), then there is nothing to be gained by thinking about compounding. You just estimate how much time it would have taken for the rest of the field to make an equal amount of capabilities progress now. That’s the amount you shortened timelines by; there’s no change from compounding effects.
if you make a discovery now worth 2 billion dollars [...] If you make this 2 billion dollar discovery later
Two responses:
Why are you measuring value in dollars? That is both (a) a weird metric to use and (b) not the one you had on your graph.
Why does the discovery have the same value now vs later?
I think this is pretty false. There’s no equivalent to Let’s think about slowing down AI, or a tag like Restrain AI Development (both of which are advocating an even stronger claim than just “caution”) -- there’s a few paragraphs in Paul’s post, one short comment by me, and one short post by Kaj. I’d say that hardly any optimization has gone into arguments to AI safety researchers for advancing capabilities. [...] (I agree in the wider world there’s a lot more optimization for arguments in favor of capabilities progress that people in general would find compelling, but I don’t think that matters for “what should alignment researchers think”.)
Thanks for the reply! From what you’re saying here, it seems like we already agree that “in the wider world there’s a lot more optimization for arguments in favor of capabilities progress”.
I’m surprised to hear that you “don’t think that matters for “what should alignment researchers think”.”
Alignment researchers, are part of the wider world too! And conversely, a lot of people in the wider world that don’t work on alignment directly make relevant decisions that will affect alignment and AI, and think about alignment too (likely many more of those exist than “pure” alignment researchers, and this post is addressed to them too!)
I don’t buy this separation with the wider world. Most people involved in this live in social circles connected to AI development, they’re sensitive to status, many work at companies directly developing advanced AI systems, consume information from the broader world and so on. And the vast majority of the real world’s economy has so far been straightforwardly incentivizing reasons to develop new capabilities, faster. Here’s some tweets from Kelsey that illustrate some of this point.
conversely, a lot of people in the wider world that don’t work on alignment directly make relevant decisions that will affect alignment and AI, and think about alignment too (likely many more of those exist than “pure” alignment researchers, and this post is addressed to them too!)
Your post is titled “Don’t accelerate problems you’re trying to solve”. Given that the problem you’re considering is “misalignment”, I would have thought that the people trying to solve the problem are those who work on alignment.
The first sentence of your post is “If one believes that unaligned AGI is a significant problem (>10% chance of leading to catastrophe), speeding up public progress towards AGI is obviously bad.” This is a foundational assumption for the rest of your post. I don’t really know who you have in mind as these other people, but I would guess that they don’t assign >10% chance of catastrophe.
The people you cite as making the arguments you disagree with are full-time alignment researchers.
If you actually want to convey your points to some other audience I’d recommend making another different post that doesn’t give off the strong impression that it is talking to full-time alignment researchers.
I don’t buy this separation with the wider world. Most people involved in this live in social circles connected to AI development, they’re sensitive to status, many work at companies directly developing advanced AI systems, consume information from the broader world and so on.
I agree that status and “what my peers believe” determine what people do to a great extent. If you had said “lots of alignment researchers are embedded in communities where capabilities work is high-status; they should be worried that they’re being biased towards capabilities work as a result”, I wouldn’t have objected.
You also point out that people hear arguments from the broader world, but it seems like arguments from the community are way way way more influential on their beliefs than the ones from the broader world. (For example, they think there’s >10% chance of catastrophe from AI based on argument from this community, despite the rest of the world arguing that this is dumb.)
the vast majority of the real world’s economy has so far been straightforwardly incentivizing reasons to develop new capabilities, faster. Here’s some tweets from Kelsey that illustrate some of this point.
I looked at the linked tweet and a few surrounding it and they seem completely unrelated? E.g. the word “capabilities” doesn’t appear at all (or its synonyms).
I’m guessing you mean Kelsey’s point that EAs go to orgs that think safety is easy because those are the ones that are hiring, but (a) that’s not saying that those EAs then work on capabilities and (b) that’s not talking about optimized arguments, but instead about selection bias in who is hiring.
Your graph shows “a small increase” that represents progress that is equal to an advance of a third to a half the time left until catastrophe on the default trajectory. That’s not small!
Yes, I was going to say something similar. It looks like the value of the purple curve is about double the blue curve when the purple curve hits AGI. If they have the same doubling time, that means the “small” increase is a full doubling of progress, all in one go. Also, the time you arrive ahead of the original curve is equal to the time it takes the original curve to catch up with you. So if your “small” jump gets you to AGI in 10 years instead of 15, then your “small” jump represents 5 years of progress. This is easier to see on a log plot:
A horizontal shift seems more realistic. You just compress one part of the curve or skip ahead along it. Then a small shift early on leads to an equally small shift later.
Of course, there are other considerations like overhang and community effects that complicate the picture, but this seems like a better place to start.
Your graph shows “a small increase” that represents progress that is equal to an advance of a third to a half the time left until catastrophe on the default trajectory. That’s not small! That’s as much progress as everyone else combined achieves in a third of the time till catastrophic models! It feels like you’d have to figure out some newer efficient training that allows you to get GPT-3 levels of performance with GPT-2 levels of compute to have an effect that was plausibly that large.
In general I wish you would actually write down equations for your model and plug in actual numbers; I think it would be way more obvious that things like this are not actually reasonable models of what’s going on.
Yes, that’s exactly what Paul is saying? That one good thing about discovering new capabilities is that it eliminates capability overhangs? Why is this a rebuttal of his point?
Why do you believe this “drastically” slowed down progress?
I think this is pretty false. There’s no equivalent to Let’s think about slowing down AI, or a tag like Restrain AI Development (both of which are advocating an even stronger claim than just “caution”) -- there’s a few paragraphs in Paul’s post, one short comment by me, and one short post by Kaj. I’d say that hardly any optimization has gone into arguments to AI safety researchers for advancing capabilities.
(Just in this post you introduce a new argument about how small increases can make big differences to exponentials, whereas afaict there is basically just one argument for advancing capabilities that people put forward, i.e. your “Dangerous Argument 2”. Note that “Dangerous Argument 1″ is not an argument for advancing capabilities, it is an argument that the negative effect is small in magnitude. You couldn’t reasonably apply that argument to advocate for pure capabilities work.)
(I agree in the wider world there’s a lot more optimization for arguments in favor of capabilities progress that people in general would find compelling, but I don’t think that matters for “what should alignment researchers think”.)
(EDIT: I thought of a couple of additional arguments that appeal to alignment researchers that people make for advancing capabilities, namely “maybe geopolitics gets worse in the future so we should have AGI sooner” and “maybe AGI can help us deal with all the other problems of the world, including other x-risks, so we should get it sooner”. I still think a lot more optimization pressure has gone into the arguments for caution.)
I’m not questioning your experiences at Conjecture—you’d know best what incentives are present there—but this has not been my experience at CHAI or DeepMind Alignment (the two places I’ve worked at).
Not OP, just some personal takes:
To me, it seems like the claim that is (implicitly) being made here is that small improvements early on compound to have much bigger impacts later on, and also a larger shortening of the overall timeline to some threshold. (To be clear, I don’t think the exponential model presented provides evidence for this latter claim)
I think the first claim is obviously true. The second claim could be true in practice, though I feel quite uncertain about this. It happens to be false in the specific model of moving an exponential up (if you instantaneously double the progress at some point in time, the deadline moves one doubling-time closer, but the total amount of capabilities at every future point in time doubles). It might hold under a hyperbolic model or something; I think it would be interesting to nail down a quantitative model here.
As a trivial example, consider a hypothetical world (I don’t think we’re in literally this world, this is just for illustration) where an overhang is the only thing keeping us from AGI. Then in this world, closing the overhang faster seems obviously bad.
More generally, the only time when closing an overhang is obviously good is when (a) there is an extremely high chance that it will be closed sooner or later, and (b) that counterfactual impact on future capabilities beyond the point where the overhang is closed is fully screened off (i.e nobody is spending more earlier because of impressive capabilities demos, people having longer to play with these better capabilities doesn’t lead to additional capabilities insights, and the overhang isn’t the only thing keeping us from AGI). The hypothetical trivial example at this opposite extreme is if you knew with absolute certainty that AI timelines were exactly X years, and you had a button that would cause X/2 years of capabilities research to happen instantaneously, then an X/2 year hiatus during which no capabilities happens at all, then the resumption of capabilities research on exactly the same track as before, with no change in capabilities investment/insights relative to the counterfactual.
If you can’t be certain about these conditions (in particular, it seems the OP is claiming mostly that (a) is very hard to be confident about), then it seems like the prudent decision is not to close the overhang.
As you note, the second claim is false for the model the OP mentions. I don’t care about the first claim once you know whether the second claim is true or false, which is the important part.
I agree it could be true in practice in other models but I am unhappy about the pattern where someone makes a claim based on arguments that are clearly wrong, and then you treat the claim as something worth thinking about anyway. (To be fair maybe you already believed the claim or were interested in it rather than reacting to it being present in this post, but I still wish you’d say something like “this post is zero evidence for the claim, people should not update at all on it, separately I think it might be true”.)
To my knowledge, nobody in this debate thinks that advancing capabilities is uniformly good. Yes, obviously there is an effect of “less time for alignment research” which I think is bad all else equal. The point is just that there is also a positive impact of “lessens overhangs”.
I find the principle “don’t do X if it has any negative effects, no matter how many positive effects it has” extremely weird but I agree if you endorse that that means you should never work on things that advance capabilities. But if you endorse that principle, why did you join OpenAI?
“This [model] is zero evidence for the claim” is a roughly accurate view of my opinion. I think you’re right that epistemically it would have been much better for me to have said something along those lines. Will edit something into my original comment.
Errr, I feel like we already agree on this point? Like I’m saying almost exactly the same thing you’re saying; sorry if I didn’t make it prominent enough:I’m also not claiming this is an accurate model; I think I have quite a bit of uncertainty as to what model makes the most sense.I was not intending to make a claim of this strength, so I’ll walk back what I said. What I meant to say was “I think most of the time the benefit of closing overhangs is much smaller than the cost of reduced timelines, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”. I think I was thinking too much inside my inside view when writing the comment, and baking in a few other assumptions from my model (including: closing overhangs benefiting capabilities at least as much, research being kinda inefficient (though not as inefficient as OP thinks probably)). I think on an outside view I would endorse a weaker but directionally same version of my claim.
In my work I do try to avoid advancing capabilities where possible, though I think I can always do better at this.
Yes, sorry, I realized that right after I posted and replaced it with a better response, but apparently you already saw it :(
But like, why? I wish people would argue for this instead of flatly asserting it and then talking about increased scrutiny or burdens of proof (which I also don’t like).
I think maybe the crux is the part about the strength of the incentives towards doing capabilities. From my perspective it generally seems like this incentive gradient is pretty real: getting funded for capabilities is a lot easier, it’s a lot more prestigious and high status in the mainstream, etc. I also myself viscerally feel the pull of wishful thinking (I really want to be wrong about high P(doom)!) and spend a lot of willpower trying to combat it (but also not so much that I fail to update where things genuinely are not as bad as I would expect, but also not allowing that to be an excuse for wishful thinking, etc...).
In that case, I think you should try and find out what the incentive gradient is like for other people before prescribing the actions that they should take. I’d predict that for a lot of alignment researchers your list of incentives mostly doesn’t resonate, relative to things like:
Active discomfort at potentially contributing to a problem that could end humanity
Social pressure + status incentives from EAs / rationalists to work on safety and not capabilities
Desire to work on philosophical or mathematical puzzles, rather than mucking around in the weeds of ML engineering
Wanting to do something big-picture / impactful / meaningful (tbc this could apply to both alignment and capabilities)
For reference, I’d list (2) and (4) as the main things that affects me, with maybe a little bit of (3), and I used to also be pretty affected by (1). None of the things you listed feel like they affect me much (now or in the past), except perhaps wishful thinking (though I don’t really see that as an “incentive”).
I’m not sure I get your point here: the point of the graph is to just illustrate that when effects compound, looking only at the short term difference is misleading. Short term differences lead to much larger long term effects due to compounding. The graph was just a quick digital rendition of what I previously drew on a whiteboard to illustrate the concept, and is meant to be an intuition pump.
The model is not implying any more sophisticated complex mathematical insight than just “remember that compounding effects exist”, and the graph is just for illustrative purposes.
Of course, if you had perfect information at the bottom of the curve, you would see that the effect your “small” intervention is having is actually quite big: but that’s precisely the point of the post, it’s very hard to see this normally! We don’t have perfect information, and the point aims at raising salience to people’s minds that what they perceive as a “small” action in the present moment, will likely lead to a “big” impact later on.
To illustrate the point: if you make a discovery now worth 2 billion dollars more of investment in AI capabilities, and this compounds yearly at a 20% rate, you’ll get far more than +2 billion in the final total e.g., 10 years later. If you make this 2 billion dollar discovery later, after ten years you will not have as much money invested in capabilities as you would have in the other case!
Such effects might be obvious in retrospect with perfect information, but this is indeed the point of the post: when evaluating actions in our present moment it’s quite hard to foresee these things, and the post aims to raise these effects to salience!
We could spend time on more graphs, equations and numbers, but that wouldn’t be a great marginal use of our time. Feel free to spend more time on this if you find it worthwhile (it’s a pretty hard task, since no one has a sufficiently gears-level model of progress!).
I continue to think that if your model is that capabilities follow an exponential (i.e. dC/dt = kC), then there is nothing to be gained by thinking about compounding. You just estimate how much time it would have taken for the rest of the field to make an equal amount of capabilities progress now. That’s the amount you shortened timelines by; there’s no change from compounding effects.
Two responses:
Why are you measuring value in dollars? That is both (a) a weird metric to use and (b) not the one you had on your graph.
Why does the discovery have the same value now vs later?
Thanks for the reply! From what you’re saying here, it seems like we already agree that “in the wider world there’s a lot more optimization for arguments in favor of capabilities progress”.
I’m surprised to hear that you “don’t think that matters for “what should alignment researchers think”.”
Alignment researchers, are part of the wider world too! And conversely, a lot of people in the wider world that don’t work on alignment directly make relevant decisions that will affect alignment and AI, and think about alignment too (likely many more of those exist than “pure” alignment researchers, and this post is addressed to them too!)
I don’t buy this separation with the wider world. Most people involved in this live in social circles connected to AI development, they’re sensitive to status, many work at companies directly developing advanced AI systems, consume information from the broader world and so on. And the vast majority of the real world’s economy has so far been straightforwardly incentivizing reasons to develop new capabilities, faster. Here’s some tweets from Kelsey that illustrate some of this point.
Your post is titled “Don’t accelerate problems you’re trying to solve”. Given that the problem you’re considering is “misalignment”, I would have thought that the people trying to solve the problem are those who work on alignment.
The first sentence of your post is “If one believes that unaligned AGI is a significant problem (>10% chance of leading to catastrophe), speeding up public progress towards AGI is obviously bad.” This is a foundational assumption for the rest of your post. I don’t really know who you have in mind as these other people, but I would guess that they don’t assign >10% chance of catastrophe.
The people you cite as making the arguments you disagree with are full-time alignment researchers.
If you actually want to convey your points to some other audience I’d recommend making another different post that doesn’t give off the strong impression that it is talking to full-time alignment researchers.
I agree that status and “what my peers believe” determine what people do to a great extent. If you had said “lots of alignment researchers are embedded in communities where capabilities work is high-status; they should be worried that they’re being biased towards capabilities work as a result”, I wouldn’t have objected.
You also point out that people hear arguments from the broader world, but it seems like arguments from the community are way way way more influential on their beliefs than the ones from the broader world. (For example, they think there’s >10% chance of catastrophe from AI based on argument from this community, despite the rest of the world arguing that this is dumb.)
I looked at the linked tweet and a few surrounding it and they seem completely unrelated? E.g. the word “capabilities” doesn’t appear at all (or its synonyms).
I’m guessing you mean Kelsey’s point that EAs go to orgs that think safety is easy because those are the ones that are hiring, but (a) that’s not saying that those EAs then work on capabilities and (b) that’s not talking about optimized arguments, but instead about selection bias in who is hiring.
Yes, I was going to say something similar. It looks like the value of the purple curve is about double the blue curve when the purple curve hits AGI. If they have the same doubling time, that means the “small” increase is a full doubling of progress, all in one go. Also, the time you arrive ahead of the original curve is equal to the time it takes the original curve to catch up with you. So if your “small” jump gets you to AGI in 10 years instead of 15, then your “small” jump represents 5 years of progress. This is easier to see on a log plot:
A horizontal shift seems more realistic. You just compress one part of the curve or skip ahead along it. Then a small shift early on leads to an equally small shift later.
Of course, there are other considerations like overhang and community effects that complicate the picture, but this seems like a better place to start.