Can anyone point me to a write-up steelmanning the OpenAI safety strategy; or, alternatively, offer your take on it? To my knowledge, there’s no official post on this, but has anyone written an informal one?
Essentially what I’m looking for is something like an expanded/OpenAI version of AXRP ep 16 with Geoffrey Irving in which he lays out the case for DM’s recent work on LM alignment. The closest thing I know of is AXRP ep 6 with Beth Barnes.
Can anyone point me to a write-up steelmanning the OpenAI safety strategy; or, alternatively, offer your take on it? To my knowledge, there’s no official post on this, but has anyone written an informal one?
Essentially what I’m looking for is something like an expanded/OpenAI version of AXRP ep 16 with Geoffrey Irving in which he lays out the case for DM’s recent work on LM alignment. The closest thing I know of is AXRP ep 6 with Beth Barnes.