My guess at the agent foundations answer would be that, sure, human value needs to survive the transition from “The smartest thing alive is a human” to “the smartest thing alive is a transformer trained by gradient descent”, but it also needs to survive the transition from “the smartest thing alive is a gradient descent transformer” to “the smartest thing alive is coded by an aligned transformer, but internally is meta-shrudlu / a hand coded tree searcher / a transformer trained by the update rule used in human brains / a simulation of 60,000 copies of obama wired together / etc” and stopping progress in order prevent that second transition is just as hard as stopping now to prevent the first transition.
My guess at the agent foundations answer would be that, sure, human value needs to survive the transition from “The smartest thing alive is a human” to “the smartest thing alive is a transformer trained by gradient descent”, but it also needs to survive the transition from “the smartest thing alive is a gradient descent transformer” to “the smartest thing alive is coded by an aligned transformer, but internally is meta-shrudlu / a hand coded tree searcher / a transformer trained by the update rule used in human brains / a simulation of 60,000 copies of obama wired together / etc” and stopping progress in order prevent that second transition is just as hard as stopping now to prevent the first transition.