Signer comments on OpenAI’s Alignment Plans

Signer 27 Aug 2022 15:18 UTC
−2 points
−3
I just want the trade-off to be made explicitly. If it turns out that −7 people in expectation is better than thinking about utility functions and all other alternatives—fine. But that’s the argument that depends on actual numbers. Yes, it’s possible to think informally and correctly. But maybe “an alignment research who thinks carefully about x-risk” wasn’t what was happening.

To be clear, this argument does not apply to more powerful systems!

Before running InstructGPT what was the technical reason why it wouldn’t be powerful?