David Scott Krueger (formerly: capybaralet) comments on Predictions for GPT-N

David Scott Krueger (formerly: capybaralet) 1 Aug 2020 18:18 UTC
3 points
I think GPT-3 should be viewed as roughly as aligned as IDA would be if we pursued it using our current understanding. GPT-3 is trained via self-supervised learning (which is, on the face of it, myopic), so the only obvious x-safety concerns are something like mesa-optimization.
In my mind, the main argument for IDA being safe is still myopia.
I think GPT-3 seems safer than (recursive) reward modelling, CIRL, or any other alignment proposals based on deliberately building agent-y AI systems.
--------------------
In the above, I’m ignoring the ways in which any of these systems increase x-risk via their (e.g. destabilizing) social impact and/or contribution towards accelerating timelines.
What links here?
- David Scott Krueger (formerly: capybaralet)'s comment on Developmental Stages of GPTs by orthonormal (1 Aug 2020 23:21 UTC; 6 points)