Thanks Seth for your post! I believe I get your point, and in fact I made a post that described exactly that approach. in detail I recommend conditioning the model by using an existing technique called control vectors (or steering vectors), that achieves a raw but incomplete form of safety—in my opinion, just enough partial safety to work on solutioning full safety with the help of AIs.
Thanks Seth for your post! I believe I get your point, and in fact I made a post that described exactly that approach. in detail I recommend conditioning the model by using an existing technique called control vectors (or steering vectors), that achieves a raw but incomplete form of safety—in my opinion, just enough partial safety to work on solutioning full safety with the help of AIs.
Of course, I am happy to be challenged.