I looked over a bit of David’s public facing work, eg: https://www.youtube.com/watch?v=I7hJggz41oU
I think there is a fundamental difference between robust, security minded alignment and tweaking smaller language models to produce output that “looks” correct. It seems David is very optimistic about how easy these problems are to solve.
I looked over a bit of David’s public facing work, eg: https://www.youtube.com/watch?v=I7hJggz41oU
I think there is a fundamental difference between robust, security minded alignment and tweaking smaller language models to produce output that “looks” correct. It seems David is very optimistic about how easy these problems are to solve.