“Safe” as in “safe enough for it to be on net better to run it” or “safe enough it wouldn’t definitely kill everyone”. It’s not that I don’t have popular intuition that GPT wouldn’t kill anyone. It’s just that I don’t think it’s a good habit to run progressively more capable systems while relying on informal intuitions about their safety. And then maybe I will see an explanation for why future safety tools would outpace capability progress, when now we are already at the point where current safety tools are not applicable to current AI systems.
I’m pretty unconvinced by this. I do not think that any substantial fraction of AI x-risk comes from an alignment research who thinks carefully about x-risk deciding that a GPT-3 level system isn’t scary enough to take significant precautions with re boxing.
I think taking frivolous risks is bad, but that risk aversion to the point of not being able to pursue otherwise promising research directions seems pretty costly, while the benefits of averting risks >1e-9 is pretty negligible in comparison.
(To be clear, this argument does not apply to more powerful systems! As systems get smarter we should be more capable, and try to be very conservative! But ultimately everything is a trade-off—letting GPT-3 talk to human contractors giving feedback is a way of letting it out of the box!)
I just want the trade-off to be made explicitly. If it turns out that −7 people in expectation is better than thinking about utility functions and all other alternatives—fine. But that’s the argument that depends on actual numbers. Yes, it’s possible to think informally and correctly. But maybe “an alignment research who thinks carefully about x-risk” wasn’t what was happening.
To be clear, this argument does not apply to more powerful systems!
Before running InstructGPT what was the technical reason why it wouldn’t be powerful?
Who is claiming that it is safe? I didn’t get that implication from the post
“Safe” as in “safe enough for it to be on net better to run it” or “safe enough it wouldn’t definitely kill everyone”. It’s not that I don’t have popular intuition that GPT wouldn’t kill anyone. It’s just that I don’t think it’s a good habit to run progressively more capable systems while relying on informal intuitions about their safety. And then maybe I will see an explanation for why future safety tools would outpace capability progress, when now we are already at the point where current safety tools are not applicable to current AI systems.
I’m pretty unconvinced by this. I do not think that any substantial fraction of AI x-risk comes from an alignment research who thinks carefully about x-risk deciding that a GPT-3 level system isn’t scary enough to take significant precautions with re boxing.
I think taking frivolous risks is bad, but that risk aversion to the point of not being able to pursue otherwise promising research directions seems pretty costly, while the benefits of averting risks >1e-9 is pretty negligible in comparison.
(To be clear, this argument does not apply to more powerful systems! As systems get smarter we should be more capable, and try to be very conservative! But ultimately everything is a trade-off—letting GPT-3 talk to human contractors giving feedback is a way of letting it out of the box!)
I just want the trade-off to be made explicitly. If it turns out that −7 people in expectation is better than thinking about utility functions and all other alternatives—fine. But that’s the argument that depends on actual numbers. Yes, it’s possible to think informally and correctly. But maybe “an alignment research who thinks carefully about x-risk” wasn’t what was happening.
Before running InstructGPT what was the technical reason why it wouldn’t be powerful?