To me, it seems like the claim that is (implicitly) being made here is that small improvements early on compound to have much bigger impacts later on, and also a larger shortening of the overall timeline to some threshold.
As you note, the second claim is false for the model the OP mentions. I don’t care about the first claim once you know whether the second claim is true or false, which is the important part.
I agree it could be true in practice in other models but I am unhappy about the pattern where someone makes a claim based on arguments that are clearly wrong, and then you treat the claim as something worth thinking about anyway. (To be fair maybe you already believed the claim or were interested in it rather than reacting to it being present in this post, but I still wish you’d say something like “this post is zero evidence for the claim, people should not update at all on it, separately I think it might be true”.)
As a trivial example, consider a hypothetical world (I don’t think we’re in literally this world, this is just for illustration) where an overhang is the only thing keeping us from AGI. Then in this world, closing the overhang faster seems obviously bad.
To my knowledge, nobody in this debate thinks that advancing capabilities is uniformly good. Yes, obviously there is an effect of “less time for alignment research” which I think is bad all else equal. The point is just that there is also a positive impact of “lessens overhangs”.
If you can’t be certain about these conditions (in particular, it seems the OP is claiming mostly that (a) is very hard to be confident about), then it seems like the prudent decision is not to close the overhang.
I find the principle “don’t do X if it has any negative effects, no matter how many positive effects it has” extremely weird but I agree if you endorse that that means you should never work on things that advance capabilities. But if you endorse that principle, why did you join OpenAI?
“This [model] is zero evidence for the claim” is a roughly accurate view of my opinion. I think you’re right that epistemically it would have been much better for me to have said something along those lines. Will edit something into my original comment.
Exponentials are memoryless. If you advance an exponential to where it would be one year from now. then some future milestone (like “level of capability required for doom”) appears exactly one year earlier. [...]
Errr, I feel like we already agree on this point? Like I’m saying almost exactly the same thing you’re saying; sorry if I didn’t make it prominent enough:
It happens to be false in the specific model of moving an exponential up (if you instantaneously double the progress at some point in time, the deadline moves one doubling-time closer, but the total amount of capabilities at every future point in time doubles).
I’m also not claiming this is an accurate model; I think I have quite a bit of uncertainty as to what model makes the most sense.
don’t do X if it has any negative effects, no matter how many positive effects it has
I was not intending to make a claim of this strength, so I’ll walk back what I said. What I meant to say was “I think most of the time the benefit of closing overhangs is much smaller than the cost of reduced timelines, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”. I think I was thinking too much inside my inside view when writing the comment, and baking in a few other assumptions from my model (including: closing overhangs benefiting capabilities at least as much, research being kinda inefficient (though not as inefficient as OP thinks probably)). I think on an outside view I would endorse a weaker but directionally same version of my claim.
In my work I do try to avoid advancing capabilities where possible, though I think I can always do better at this.
Yes, sorry, I realized that right after I posted and replaced it with a better response, but apparently you already saw it :(
What I meant to say was “I think most of the time closing overhangs is more negative than positive, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”.
But like, why? I wish people would argue for this instead of flatly asserting it and then talking about increased scrutiny or burdens of proof (which I also don’t like).
I think maybe the crux is the part about the strength of the incentives towards doing capabilities. From my perspective it generally seems like this incentive gradient is pretty real: getting funded for capabilities is a lot easier, it’s a lot more prestigious and high status in the mainstream, etc. I also myself viscerally feel the pull of wishful thinking (I really want to be wrong about high P(doom)!) and spend a lot of willpower trying to combat it (but also not so much that I fail to update where things genuinely are not as bad as I would expect, but also not allowing that to be an excuse for wishful thinking, etc...).
In that case, I think you should try and find out what the incentive gradient is like for other people before prescribing the actions that they should take. I’d predict that for a lot of alignment researchers your list of incentives mostly doesn’t resonate, relative to things like:
Active discomfort at potentially contributing to a problem that could end humanity
Social pressure + status incentives from EAs / rationalists to work on safety and not capabilities
Desire to work on philosophical or mathematical puzzles, rather than mucking around in the weeds of ML engineering
Wanting to do something big-picture / impactful / meaningful (tbc this could apply to both alignment and capabilities)
For reference, I’d list (2) and (4) as the main things that affects me, with maybe a little bit of (3), and I used to also be pretty affected by (1). None of the things you listed feel like they affect me much (now or in the past), except perhaps wishful thinking (though I don’t really see that as an “incentive”).
As you note, the second claim is false for the model the OP mentions. I don’t care about the first claim once you know whether the second claim is true or false, which is the important part.
I agree it could be true in practice in other models but I am unhappy about the pattern where someone makes a claim based on arguments that are clearly wrong, and then you treat the claim as something worth thinking about anyway. (To be fair maybe you already believed the claim or were interested in it rather than reacting to it being present in this post, but I still wish you’d say something like “this post is zero evidence for the claim, people should not update at all on it, separately I think it might be true”.)
To my knowledge, nobody in this debate thinks that advancing capabilities is uniformly good. Yes, obviously there is an effect of “less time for alignment research” which I think is bad all else equal. The point is just that there is also a positive impact of “lessens overhangs”.
I find the principle “don’t do X if it has any negative effects, no matter how many positive effects it has” extremely weird but I agree if you endorse that that means you should never work on things that advance capabilities. But if you endorse that principle, why did you join OpenAI?
“This [model] is zero evidence for the claim” is a roughly accurate view of my opinion. I think you’re right that epistemically it would have been much better for me to have said something along those lines. Will edit something into my original comment.
Errr, I feel like we already agree on this point? Like I’m saying almost exactly the same thing you’re saying; sorry if I didn’t make it prominent enough:I’m also not claiming this is an accurate model; I think I have quite a bit of uncertainty as to what model makes the most sense.I was not intending to make a claim of this strength, so I’ll walk back what I said. What I meant to say was “I think most of the time the benefit of closing overhangs is much smaller than the cost of reduced timelines, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”. I think I was thinking too much inside my inside view when writing the comment, and baking in a few other assumptions from my model (including: closing overhangs benefiting capabilities at least as much, research being kinda inefficient (though not as inefficient as OP thinks probably)). I think on an outside view I would endorse a weaker but directionally same version of my claim.
In my work I do try to avoid advancing capabilities where possible, though I think I can always do better at this.
Yes, sorry, I realized that right after I posted and replaced it with a better response, but apparently you already saw it :(
But like, why? I wish people would argue for this instead of flatly asserting it and then talking about increased scrutiny or burdens of proof (which I also don’t like).
I think maybe the crux is the part about the strength of the incentives towards doing capabilities. From my perspective it generally seems like this incentive gradient is pretty real: getting funded for capabilities is a lot easier, it’s a lot more prestigious and high status in the mainstream, etc. I also myself viscerally feel the pull of wishful thinking (I really want to be wrong about high P(doom)!) and spend a lot of willpower trying to combat it (but also not so much that I fail to update where things genuinely are not as bad as I would expect, but also not allowing that to be an excuse for wishful thinking, etc...).
In that case, I think you should try and find out what the incentive gradient is like for other people before prescribing the actions that they should take. I’d predict that for a lot of alignment researchers your list of incentives mostly doesn’t resonate, relative to things like:
Active discomfort at potentially contributing to a problem that could end humanity
Social pressure + status incentives from EAs / rationalists to work on safety and not capabilities
Desire to work on philosophical or mathematical puzzles, rather than mucking around in the weeds of ML engineering
Wanting to do something big-picture / impactful / meaningful (tbc this could apply to both alignment and capabilities)
For reference, I’d list (2) and (4) as the main things that affects me, with maybe a little bit of (3), and I used to also be pretty affected by (1). None of the things you listed feel like they affect me much (now or in the past), except perhaps wishful thinking (though I don’t really see that as an “incentive”).