I would argue that accelerating alignment research more than capabilities research should actually be considered a basic safety feature.
A more straightforward but extreme approach here is just to ban plausibly capabilities/scaling ML usage on the API unless users are approved as doing safety research. Like if you think advancing ML is just somewhat bad, you can just stop people from doing it.
That said, I think large fraction of ML research seem maybe fine/good and the main bad things are just algorithmic efficiency improvements on serious scaling (including better data) and other types of architectural changes.
Presumably this already bites (e.g.) virus gain-of-function researchers who would like to make more dangerous pathogens, but can’t get advice from LLMs.
I am not sure whether I am more excited about ‘positive’ approaches (accelerating alignment research more) vs ‘negative’ approaches (cooling down capability-gain research). I agree that some sorts of capability-gain research are much more/less dangerous than others, and the most clearly risky stuff right now is scaling & scaling-related.
A more straightforward but extreme approach here is just to ban plausibly capabilities/scaling ML usage on the API unless users are approved as doing safety research. Like if you think advancing ML is just somewhat bad, you can just stop people from doing it.
That said, I think large fraction of ML research seem maybe fine/good and the main bad things are just algorithmic efficiency improvements on serious scaling (including better data) and other types of architectural changes.
Presumably this already bites (e.g.) virus gain-of-function researchers who would like to make more dangerous pathogens, but can’t get advice from LLMs.
I am not sure whether I am more excited about ‘positive’ approaches (accelerating alignment research more) vs ‘negative’ approaches (cooling down capability-gain research). I agree that some sorts of capability-gain research are much more/less dangerous than others, and the most clearly risky stuff right now is scaling & scaling-related.