The most important thing to realize about AI alignment is that basically all versions of practically aligned AI must make certain assumptions that no one does a specific action (mostly related to misuse reasons, but for some specific plans, can also be related to misalignment reasons).
Another way to say it is that I believe that in practice, these two categories are the same category, such that basically all work that’s useful in the field will require someone not to do something, so the costs of sharing are practically 0, and the expected value of sharing insights is likely very large.
Specifically, I’m asserting that these 2 categories are actually one category for most purposes:
Actually make AI safe and another, sadder but easier field of “Make AI safe as long as no one does the thing.”
The most important thing to realize about AI alignment is that basically all versions of practically aligned AI must make certain assumptions that no one does a specific action (mostly related to misuse reasons, but for some specific plans, can also be related to misalignment reasons).
Another way to say it is that I believe that in practice, these two categories are the same category, such that basically all work that’s useful in the field will require someone not to do something, so the costs of sharing are practically 0, and the expected value of sharing insights is likely very large.
Specifically, I’m asserting that these 2 categories are actually one category for most purposes: