Yeah, all four of those are real things happening, and are exactly the sorts of things I think the post has in mind.
I take “make AI alignment seem legit” to refer to a bunch of actions that are optimized to push public discourse and perceptions around. Here’s a list of things that come to my mind:
Trying to get alignment research to look more like a mainstream field, by e.g. funding professors and PhD students who frame their work as alignment and giving them publicity, organizing conferences that try to rope in existing players who have perceived legitimacy, etc
Papers like Concrete Problems in AI Safety that try to tie AI risk to stuff that’s already in the overton window / already perceived as legitimate
Optimizing language in posts / papers to be perceived well, by e.g. steering clear of the part where we’re worried AI will literally kill everyone
Efforts to make it politically untenable for AI orgs to not have some narrative around safety
Each of these things seems like they have a core good thing, but according to me they’ve all backfired to the extend that they were optimized to avoid the thorny parts of AI x-risk, because this enables rampant goodharting. Specifically I think the effects of avoiding the core stuff have been bad, creating weird cargo cults around alignment research, making it easier for orgs to have fake narratives about how they care about alignment, and etc.
Yeah, all four of those are real things happening, and are exactly the sorts of things I think the post has in mind.
I take “make AI alignment seem legit” to refer to a bunch of actions that are optimized to push public discourse and perceptions around. Here’s a list of things that come to my mind:
Trying to get alignment research to look more like a mainstream field, by e.g. funding professors and PhD students who frame their work as alignment and giving them publicity, organizing conferences that try to rope in existing players who have perceived legitimacy, etc
Papers like Concrete Problems in AI Safety that try to tie AI risk to stuff that’s already in the overton window / already perceived as legitimate
Optimizing language in posts / papers to be perceived well, by e.g. steering clear of the part where we’re worried AI will literally kill everyone
Efforts to make it politically untenable for AI orgs to not have some narrative around safety
Each of these things seems like they have a core good thing, but according to me they’ve all backfired to the extend that they were optimized to avoid the thorny parts of AI x-risk, because this enables rampant goodharting. Specifically I think the effects of avoiding the core stuff have been bad, creating weird cargo cults around alignment research, making it easier for orgs to have fake narratives about how they care about alignment, and etc.