Zack_M_Davis comments on Alignment Implications of LLM Successes: a Debate in One Act

Zack_M_Davis 13 Jan 2025 16:35 UTC
6 points
1
Simplicia: But how do you know that? Obviously, an arbitrarily powerful expected utility maximizer would kill all humans unless it had a very special utility function. Obviously, there exist programs which behave like a webtext-next-token-predictor given webtext-like input but superintelligently kill all humans on out-of-distribution inputs. Obviously, an arbitrarily powerful expected utility maximizer would be good at predicting webtext. But it’s not at all clear that using gradient descent to approximate the webtext next-token-function gives you an arbitrarily powerful expected utility maximizer. Why would that happen? I’m not denying any of the vNM axioms; I’m saying I don’t think the vNM axioms imply that.