I actually sort of disagree with a point he makes, and I think this is related to why I’m not nearly as pessimistic as the most extreme pessimistic people on LW despite thinking we are in the approximately worst case scenario.
First, some meta points on why I disagree with the inevitable pessimism:
Catastrophe predictions have a poor track record. I’d be surprised if any catastrophe predicted actually happened. This alone gives reasons for strong priors that catastrophe won’t happen.
Most technologies actually built tend to have lower impact than people think. While I do think that AI is actually reasonably special for agentic AI, I am also cognizant of how many predictions of high impact had to be lower.
LW has deep selection biases/effects in 2 directions, believing AI timelines to be short, and believing AI to have massive impact. This means that we shouldn’t be surprised if people don’t have good arguments against LW views on AI, even if the true evidence for doom isn’t overwhelming or indeed even little.
Now some object level points for why I disagree with David Chapman.
I flatly disagree with “We’ve found zero that lead to good outcomes”, since there are in fact, scenarios where there are good outcomes. That doesn’t mean that we can relax now, but contra Eliezer, our position isn’t dying with dignity, but rather we are still making progress.
I think there is evidence against the hypothesis that powerful optimizers inevitably Goodhart what we value, and it’s in air conditioning. John Wentworth had posts on how consumers don’t even realize that much better air conditioning exists. The comment threads, however showed that the claimed advantages of double hosed air conditioning were far less universal and far less impactful than John Wentworth thought it was.
This is non-trivial evidence against the claim because John Wentworth admitted he cherry picked this example, and this is important.
The Natural Abstraction hypothesis could be true, and even a weak version being true would have massive benefits for alignment, primarily because of Thane Ruthenis’s post. This is a big reason why I remain hopeful for alignment.
I actually sort of disagree with a point he makes, and I think this is related to why I’m not nearly as pessimistic as the most extreme pessimistic people on LW despite thinking we are in the approximately worst case scenario.
First, some meta points on why I disagree with the inevitable pessimism:
Catastrophe predictions have a poor track record. I’d be surprised if any catastrophe predicted actually happened. This alone gives reasons for strong priors that catastrophe won’t happen.
Most technologies actually built tend to have lower impact than people think. While I do think that AI is actually reasonably special for agentic AI, I am also cognizant of how many predictions of high impact had to be lower.
LW has deep selection biases/effects in 2 directions, believing AI timelines to be short, and believing AI to have massive impact. This means that we shouldn’t be surprised if people don’t have good arguments against LW views on AI, even if the true evidence for doom isn’t overwhelming or indeed even little.
Now some object level points for why I disagree with David Chapman.
I flatly disagree with “We’ve found zero that lead to good outcomes”, since there are in fact, scenarios where there are good outcomes. That doesn’t mean that we can relax now, but contra Eliezer, our position isn’t dying with dignity, but rather we are still making progress.
I think there is evidence against the hypothesis that powerful optimizers inevitably Goodhart what we value, and it’s in air conditioning. John Wentworth had posts on how consumers don’t even realize that much better air conditioning exists. The comment threads, however showed that the claimed advantages of double hosed air conditioning were far less universal and far less impactful than John Wentworth thought it was.
This is non-trivial evidence against the claim because John Wentworth admitted he cherry picked this example, and this is important.
The Natural Abstraction hypothesis could be true, and even a weak version being true would have massive benefits for alignment, primarily because of Thane Ruthenis’s post. This is a big reason why I remain hopeful for alignment.
Post below:
https://www.lesswrong.com/posts/HaHcsrDSZ3ZC2b4fK/world-model-interpretability-is-all-we-need