Zach Stein-Perlman comments on Which possible AI systems are relatively safe?

Zach Stein-Perlman 21 Aug 2023 17:01 UTC
6 points
−2
Yo Shavit says:
I wish we talked more about which particular AI systems design principles give us confidence in safety.
For example: context is always erased; humans review ~all context and output tokens.
These principles are likely to disappear unless we center them in our analysis & demands.
Less worried rn about issues like “RLHF incentivizes deception”
Much more worried about “a pair of mostly-aligned AI systems talk to each other for 3 hours and then make POST requests, and no human actually reviews the full transcript”