Zach Stein-Perlman comments on Zach Stein-Perlman’s Shortform

Zach Stein-Perlman 9 Nov 2024 4:37 UTC
2 points
0
We can make scheming legible at least to sophisticated-scheming-skeptics like Sam Altman and Dario.
Especially if the AI is even sandbagging on simply-coding when it thinks it’s for safety research. And if it’s not doing that, we can get some useful safety work out of it.
@Adam Kaufman @Tyler Tracy @David Matolcsi see Ryan’s comments.
- ryan_greenblatt 9 Nov 2024 4:49 UTC
  8 points
  6
  Parent
  
  We can make scheming legible at least to sophisticated-scheming-skeptics like Sam Altman and Dario.
  
  If it was enough evidence that I was strongly convinced sure. But IDK if I would be convinced because the evidence might be actually unclear.
- ryan_greenblatt 9 Nov 2024 4:41 UTC
  6 points
  2
  Parent
  I agree you’ll be able to get some work out, but you might be taking a bit productivity hit.
  
  Also, TBC, I’m not generally that worried about generic sandbagging on safety research relative to other problems.