Yeah I was fairly sloppy here. I did mean the “like” to include tweaking to be as accurate as possible, but that plausibly didn’t bring the comment above some bar.
For clarity: I haven’t read the paper yet. My current understanding isn’t able to guess what your complaint would be though. Ryan’s more careful “the evidence suggests that if current ML systems were lying in wait with treacherous plans and instrumentally acting nice for now, we wouldn’t be able to train away the treachery” seems reasonable from what I’ve read, and so does “some evidence suggests that if current ML systems were trying to deceive us, standard methods might well fail to change them not to”.
Yeah I was fairly sloppy here. I did mean the “like” to include tweaking to be as accurate as possible, but that plausibly didn’t bring the comment above some bar.
For clarity: I haven’t read the paper yet. My current understanding isn’t able to guess what your complaint would be though. Ryan’s more careful “the evidence suggests that if current ML systems were lying in wait with treacherous plans and instrumentally acting nice for now, we wouldn’t be able to train away the treachery” seems reasonable from what I’ve read, and so does “some evidence suggests that if current ML systems were trying to deceive us, standard methods might well fail to change them not to”.