Hastings comments on Science of Deep Learning more tractably addresses the Sharp Left Turn than Agent Foundations

Hastings 19 Sep 2023 23:44 UTC
8 points
1
My guess at the agent foundations answer would be that, sure, human value needs to survive the transition from “The smartest thing alive is a human” to “the smartest thing alive is a transformer trained by gradient descent”, but it also needs to survive the transition from “the smartest thing alive is a gradient descent transformer” to “the smartest thing alive is coded by an aligned transformer, but internally is meta-shrudlu / a hand coded tree searcher / a transformer trained by the update rule used in human brains / a simulation of 60,000 copies of obama wired together / etc” and stopping progress in order prevent that second transition is just as hard as stopping now to prevent the first transition.