For mindset, I agree that doomerism isn’t good, primarily because it can close your mind off of real solutions to a problem, and make you over update to the overly pessimistic view.
As a factual statement, I also disagree with high p(Doom) probabilities, and I have a maximum of 10%, if not lower.
For object level arguments for why I disagree with the doom take, here’s the arguments:
I disagree with the assumption of Yudkowskians that certain abstractions just don’t scale well when we crank them up in capabilities. I remember a post that did interpretability on AlphaZero and found it has essentially human interpretable abstractions, which at least for the case of Go disproved that Yudkowskian notion.
I am quite a bit more optimistic on scalable alignment than many in the LW community, and in the case of recent work, showed that as AI got more data, it got more aligned with human goals. There are many other benefits in the recent work, but the fact that they showed that as a certain capability scaled up, alignment scaled up, means that the trend of alignment is positive, and more capable models will probably be more aligned.
Finally, trend lines. There’s a saying that’s inspired by the Atomic Habits book: The trend line matters more than how much progress you make in a single sitting. And in the case of alignment, that trend line is positive but slow, which means we are in a extremely good position to speed up that trend. It also means we should be far less worried about doom, as we just have to increase the trend line of alignment progress and wait.
Edit: My first point is at best, partially correct, and may need to be removed altogether due to a new paper called Adversarial Policies Beat Superhuman Go AIs.
I’ll admit, that is a fairly big blow to my first point, though the rest of my points stand. I’ll edit the comment to mention your debunking of my first point.
Both as a mindset and as a factual likelihood.
For mindset, I agree that doomerism isn’t good, primarily because it can close your mind off of real solutions to a problem, and make you over update to the overly pessimistic view.
As a factual statement, I also disagree with high p(Doom) probabilities, and I have a maximum of 10%, if not lower.
For object level arguments for why I disagree with the doom take, here’s the arguments:
I disagree with the assumption of Yudkowskians that certain abstractions just don’t scale well when we crank them up in capabilities. I remember a post that did interpretability on AlphaZero and found it has essentially human interpretable abstractions, which at least for the case of Go disproved that Yudkowskian notion.
I am quite a bit more optimistic on scalable alignment than many in the LW community, and in the case of recent work, showed that as AI got more data, it got more aligned with human goals. There are many other benefits in the recent work, but the fact that they showed that as a certain capability scaled up, alignment scaled up, means that the trend of alignment is positive, and more capable models will probably be more aligned.
Finally, trend lines. There’s a saying that’s inspired by the Atomic Habits book: The trend line matters more than how much progress you make in a single sitting. And in the case of alignment, that trend line is positive but slow, which means we are in a extremely good position to speed up that trend. It also means we should be far less worried about doom, as we just have to increase the trend line of alignment progress and wait.
Edit: My first point is at best, partially correct, and may need to be removed altogether due to a new paper called Adversarial Policies Beat Superhuman Go AIs.
Link below:
https://arxiv.org/abs/2211.00241
All other points stand.
Recent Adversarial Policies Beat Superhuman Go AIs seem to plant doubt how well abstractions generalize in the case of Go.
I’ll admit, that is a fairly big blow to my first point, though the rest of my points stand. I’ll edit the comment to mention your debunking of my first point.