I just noticed that the author of the lie detection paper I mentioned has written about its implications for alignment here on LW. @Collin thanks for writing that up. Any thoughts you have on the above would be welcome!
I just noticed that the author of the lie detection paper I mentioned has written about its implications for alignment here on LW. @Collin thanks for writing that up. Any thoughts you have on the above would be welcome!