Ajeya Cotra comments on Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya Cotra 19 Jul 2022 15:23 UTC
6 points
0
I’m still fairly optimistic about sandwiching. I deliberately considered a set of pretty naive strategies (“naive safety effort” assumption) to contrast with future posts which will explore strategies that seem more promising. Carefully-constructed versions of debate, amplification, recursive reward-modeling, etc seem like they could make a significant difference and could be tested through a framework like sandwiching.
- Jérémy Scheurer 20 Jul 2022 15:08 UTC
  1 point
  0
  Parent
  Thanks for clearing that up. This is super helpful as a context for understanding this post.