GPT-2 does not—probably, very probably, but of course nobody on Earth knows what’s actually going on in there—does not in itself do something that amounts to checking possible pathways through time/events/causality/environment to end up in a preferred destination class despite variation in where it starts out.
A blender may be very good at blending apples, that doesn’t mean it has a goal of blending apples.
A blender that spit out oranges as unsatisfactory, pushed itself off the kitchen counter, stuck wires into electrical sockets in order to burn open your produce door, grabbed some apples, and blended those apples, on more than one occasion in different houses or with different starting conditions, would much more get me to say, “Well, that thing probably had some consequentialism-nature in it, about something that cashed out to blending apples” because it ended up at highly similar destinations from different starting points in a way that is improbable if nothing is navigating Time.
It doesn’t seem crazy to me that a GPT type architecture with the “Stack More Layers” could eventually model the world well enough to simulate consequentialist plans—i.e given a prompt like:
“If you are a blender with legs in environment X, what would you do to blend apples?” and provide a continuation with a detailed plan like the above (and GPT4/5 etc with more compute giving slightly better plans—maybe eventually at a superhuman level)
It also seems like it could do this kind of consequentialist thinking without itself having any “goals” to pursue. I’m expecting the response to be one of the following, but I’m not sure which:
“Well, if it’s already make consequentialist plans, surely it has some goals like maximizing the amount of text it generates etc., and will try to do whatever it can to ensure that (similar to the “consequentialist alphago” example in the conversation) instead of just letting itself be turned off.
A LLM / GPT will never be able to reliably output such plans with the current architecture or type of training data.
Making hyperbole: very good random number generator sometimes can output numbers corresponding to some consequentialist plans, but it’s not very useful as consequentialist.
Lowering level of hyperbole: LLMs trained to a superhuman level can produce consequentialist plans, but it can also produce many non-consequentialist useless plans. If you want it to reliably make good plans (better than human), you should apply some optimization pressure, like RLHF.
There’s a difference between “what would you do to blend apples” and “what would you do to unbox an AGI”. It’s not clear to me if it is just a difference of degree, or something deeper.
It doesn’t seem crazy to me that a GPT type architecture with the “Stack More Layers” could eventually model the world well enough to simulate consequentialist plans—i.e given a prompt like:
“If you are a blender with legs in environment X, what would you do to blend apples?” and provide a continuation with a detailed plan like the above (and GPT4/5 etc with more compute giving slightly better plans—maybe eventually at a superhuman level)
It also seems like it could do this kind of consequentialist thinking without itself having any “goals” to pursue. I’m expecting the response to be one of the following, but I’m not sure which:
“Well, if it’s already make consequentialist plans, surely it has some goals like maximizing the amount of text it generates etc., and will try to do whatever it can to ensure that (similar to the “consequentialist alphago” example in the conversation) instead of just letting itself be turned off.
A LLM / GPT will never be able to reliably output such plans with the current architecture or type of training data.
Making hyperbole: very good random number generator sometimes can output numbers corresponding to some consequentialist plans, but it’s not very useful as consequentialist.
Lowering level of hyperbole: LLMs trained to a superhuman level can produce consequentialist plans, but it can also produce many non-consequentialist useless plans. If you want it to reliably make good plans (better than human), you should apply some optimization pressure, like RLHF.
There’s a difference between “what would you do to blend apples” and “what would you do to unbox an AGI”. It’s not clear to me if it is just a difference of degree, or something deeper.