Exponentials are memoryless. If you advance an exponential to where it would be one year from now. then some future milestone (like “level of capability required for doom”) appears exactly one year earlier. [...]
Errr, I feel like we already agree on this point? Like I’m saying almost exactly the same thing you’re saying; sorry if I didn’t make it prominent enough:
It happens to be false in the specific model of moving an exponential up (if you instantaneously double the progress at some point in time, the deadline moves one doubling-time closer, but the total amount of capabilities at every future point in time doubles).
I’m also not claiming this is an accurate model; I think I have quite a bit of uncertainty as to what model makes the most sense.
don’t do X if it has any negative effects, no matter how many positive effects it has
I was not intending to make a claim of this strength, so I’ll walk back what I said. What I meant to say was “I think most of the time the benefit of closing overhangs is much smaller than the cost of reduced timelines, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”. I think I was thinking too much inside my inside view when writing the comment, and baking in a few other assumptions from my model (including: closing overhangs benefiting capabilities at least as much, research being kinda inefficient (though not as inefficient as OP thinks probably)). I think on an outside view I would endorse a weaker but directionally same version of my claim.
In my work I do try to avoid advancing capabilities where possible, though I think I can always do better at this.
Yes, sorry, I realized that right after I posted and replaced it with a better response, but apparently you already saw it :(
What I meant to say was “I think most of the time closing overhangs is more negative than positive, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”.
But like, why? I wish people would argue for this instead of flatly asserting it and then talking about increased scrutiny or burdens of proof (which I also don’t like).
I think maybe the crux is the part about the strength of the incentives towards doing capabilities. From my perspective it generally seems like this incentive gradient is pretty real: getting funded for capabilities is a lot easier, it’s a lot more prestigious and high status in the mainstream, etc. I also myself viscerally feel the pull of wishful thinking (I really want to be wrong about high P(doom)!) and spend a lot of willpower trying to combat it (but also not so much that I fail to update where things genuinely are not as bad as I would expect, but also not allowing that to be an excuse for wishful thinking, etc...).
In that case, I think you should try and find out what the incentive gradient is like for other people before prescribing the actions that they should take. I’d predict that for a lot of alignment researchers your list of incentives mostly doesn’t resonate, relative to things like:
Active discomfort at potentially contributing to a problem that could end humanity
Social pressure + status incentives from EAs / rationalists to work on safety and not capabilities
Desire to work on philosophical or mathematical puzzles, rather than mucking around in the weeds of ML engineering
Wanting to do something big-picture / impactful / meaningful (tbc this could apply to both alignment and capabilities)
For reference, I’d list (2) and (4) as the main things that affects me, with maybe a little bit of (3), and I used to also be pretty affected by (1). None of the things you listed feel like they affect me much (now or in the past), except perhaps wishful thinking (though I don’t really see that as an “incentive”).
Errr, I feel like we already agree on this point? Like I’m saying almost exactly the same thing you’re saying; sorry if I didn’t make it prominent enough:I’m also not claiming this is an accurate model; I think I have quite a bit of uncertainty as to what model makes the most sense.I was not intending to make a claim of this strength, so I’ll walk back what I said. What I meant to say was “I think most of the time the benefit of closing overhangs is much smaller than the cost of reduced timelines, and I think it makes sense to apply OP’s higher bar of scrutiny to any proposed overhang-closing proposal”. I think I was thinking too much inside my inside view when writing the comment, and baking in a few other assumptions from my model (including: closing overhangs benefiting capabilities at least as much, research being kinda inefficient (though not as inefficient as OP thinks probably)). I think on an outside view I would endorse a weaker but directionally same version of my claim.
In my work I do try to avoid advancing capabilities where possible, though I think I can always do better at this.
Yes, sorry, I realized that right after I posted and replaced it with a better response, but apparently you already saw it :(
But like, why? I wish people would argue for this instead of flatly asserting it and then talking about increased scrutiny or burdens of proof (which I also don’t like).
I think maybe the crux is the part about the strength of the incentives towards doing capabilities. From my perspective it generally seems like this incentive gradient is pretty real: getting funded for capabilities is a lot easier, it’s a lot more prestigious and high status in the mainstream, etc. I also myself viscerally feel the pull of wishful thinking (I really want to be wrong about high P(doom)!) and spend a lot of willpower trying to combat it (but also not so much that I fail to update where things genuinely are not as bad as I would expect, but also not allowing that to be an excuse for wishful thinking, etc...).
In that case, I think you should try and find out what the incentive gradient is like for other people before prescribing the actions that they should take. I’d predict that for a lot of alignment researchers your list of incentives mostly doesn’t resonate, relative to things like:
Active discomfort at potentially contributing to a problem that could end humanity
Social pressure + status incentives from EAs / rationalists to work on safety and not capabilities
Desire to work on philosophical or mathematical puzzles, rather than mucking around in the weeds of ML engineering
Wanting to do something big-picture / impactful / meaningful (tbc this could apply to both alignment and capabilities)
For reference, I’d list (2) and (4) as the main things that affects me, with maybe a little bit of (3), and I used to also be pretty affected by (1). None of the things you listed feel like they affect me much (now or in the past), except perhaps wishful thinking (though I don’t really see that as an “incentive”).