I just want the trade-off to be made explicitly. If it turns out that −7 people in expectation is better than thinking about utility functions and all other alternatives—fine. But that’s the argument that depends on actual numbers. Yes, it’s possible to think informally and correctly. But maybe “an alignment research who thinks carefully about x-risk” wasn’t what was happening.
To be clear, this argument does not apply to more powerful systems!
Before running InstructGPT what was the technical reason why it wouldn’t be powerful?
I just want the trade-off to be made explicitly. If it turns out that −7 people in expectation is better than thinking about utility functions and all other alternatives—fine. But that’s the argument that depends on actual numbers. Yes, it’s possible to think informally and correctly. But maybe “an alignment research who thinks carefully about x-risk” wasn’t what was happening.
Before running InstructGPT what was the technical reason why it wouldn’t be powerful?