ryan_greenblatt comments on What convincing warning shot could help prevent extinction from AI?

ryan_greenblatt 16 Apr 2024 4:39 UTC
4 points
0

I’m definitely not claiming that it is likely we will manage to catch AIs doing such egregiously bad actions, but I think if we did catch them, this would provide an adequate warning shot.

I think if an effective control approach is employed, early transformatively useful AIs are dangerously misaligned, and these early misaligned AIs are unwilling or unable to punt to a later generation of AIs, then catching AIs red-handed is pretty likely relative to other ways of ensuring safety. I think all of these assumptions I stated as conditions are pretty plausible.
- Akash 16 Apr 2024 20:22 UTC
  4 points
  0
  Parent
  I’ll admit I have only been loosely following the control stuff, but FWIW I would be excited about a potential @peterbarnett & @ryan_greenblatt dialogue in which you two to try to identify & analyze any potential disagreements. Example questions:
  - What is the most capable system that you think we are likely to be able to control?
  - What kind of value do you think we could get out of such a system?
  - To what extent do you expect that system to be able to produce insights that help us escape the acute risk period (i.e., get out of a scenario where someone else can come along and build a catastrophe-capable system without implementing control procedures or someone else comes along and scales to the point where the control procedures are no longer sufficient)
  - ryan_greenblatt 16 Apr 2024 21:34 UTC
    8 points
    2
    Parent
    (You might be interested in the Dialogue I already did with habryka.)