I am not sure whether I am more excited about ‘positive’ approaches (accelerating alignment research more) vs ‘negative’ approaches (cooling down capability-gain research). I agree that some sorts of capability-gain research are much more/less dangerous than others, and the most clearly risky stuff right now is scaling & scaling-related.
I am not sure whether I am more excited about ‘positive’ approaches (accelerating alignment research more) vs ‘negative’ approaches (cooling down capability-gain research). I agree that some sorts of capability-gain research are much more/less dangerous than others, and the most clearly risky stuff right now is scaling & scaling-related.