NicholasKees comments on the case for CoT unfaithfulness is overstated

NicholasKees 6 Oct 2024 11:26 UTC
8 points
5
then we can all go home, right?
Doesn’t this just shift what we worry about? If control of roughly human level and slightly superhuman systems is easy, that still leaves:
- Human institutions using AI to centralize power
- Conflict between human-controlled AI systems
- Going out with a whimper scenarios (or other multi-agent problems)
- Not understanding the reasoning of vastly superhuman AI (even with COT)
What feels underexplored to me is: If we can control roughly human-level AI systems, what do we DO with them?
- Bogdan Ionut Cirstea 18 Oct 2024 5:17 UTC
  3 points
  1
  Parent
  What feels underexplored to me is: If we can control roughly human-level AI systems, what do we DO with them?
  Automated/strongly-augmented AI risk mitigation research, among various other options that Redwood discusses in some of their posts/public appearances.