I agree with the other comments here suggesting that working hard enough on an animals’ language patterns in LLMs will develop models of the animals’ worlds based on that language use, and so develop better contexted answers in these reading comprehension questions. With no direct experience of the world.
The SVG stuff is an excellent example of there being available explicit short cuts in the data set. Much of that language use by humans and their embodied world/worldview/worldmaking is is not that explicit. To arrive at that tacit knowledge is interesting.
If beyond the stochastic parrot, now or soon, are we at the stage of stochastic maker of organ-grinders and their monkeys? (Who can churn out explicit lyrics about the language/grammar animals and their avatars use to build their worlds/markets. )
If so there may be a point where we are left asking, Who is master, the monkey or the organ? And thus we miss the entire point?
Poof. The singularity has left us behind wondering what that noise was.
I partially agree. I think stochastic parrot-ness is a spectrum. Even humans behave as stochastic parrots sometimes (for me it’s when I am tired). I think, though that we don’t really know what an experience of the world really is, and so the only way to talk about it is through an agent’s behaviors. The point of this post is that SOTA LLM are probably farther in the spectrum than most people expect (My impression from experience is that GPT4 is ~75% of the way between total stochastic parrot and human). It is better than human in some task (some specific ToM experience like the example in argument 2), but still less good in others (like at applying nuances. It can understand them, but when you want it to actually be nuanced when it acts, you only see the difference when you ask for different stuff). I think it is important to build a measure for stochastic parrot ness as this might be an useful metric for governance and a better proxy for “does it understand the world it is in?” (which I think is important for most of the realistic doom scenarios). Also, these experiences are a way to give a taste of what LLM psychology look like.
Given that in the limit (infinite data and infinite parameters in the model) LLM’s are world simulators with tiny simulated humans inside writing text on the internet, the pressure applied to that simulated human is not understanding our world, but understanding that simulated world and be an agent inside that world. Which I think gives some hope.
Of course real world LLM’s are far from that limit, and we have no idea which path to that limit gradient descent takes. Eliezer famously argued about whole “simulator vs predictor” stuff which I think relevant to that intermidiate state far from limit.
Also RLHF applies additional weird pressures, for example a pressure to be aware that it’s an AI (or at least pretend that it’s aware, whatever that might mean), which makes fine-tuned LLM’s actually less save than raw ones.
I agree with the other comments here suggesting that working hard enough on an animals’ language patterns in LLMs will develop models of the animals’ worlds based on that language use, and so develop better contexted answers in these reading comprehension questions. With no direct experience of the world.
The SVG stuff is an excellent example of there being available explicit short cuts in the data set. Much of that language use by humans and their embodied world/worldview/worldmaking is is not that explicit. To arrive at that tacit knowledge is interesting.
If beyond the stochastic parrot, now or soon, are we at the stage of stochastic maker of organ-grinders and their monkeys? (Who can churn out explicit lyrics about the language/grammar animals and their avatars use to build their worlds/markets. )
If so there may be a point where we are left asking, Who is master, the monkey or the organ? And thus we miss the entire point?
Poof. The singularity has left us behind wondering what that noise was.
Are we there yet?
I partially agree. I think stochastic parrot-ness is a spectrum. Even humans behave as stochastic parrots sometimes (for me it’s when I am tired). I think, though that we don’t really know what an experience of the world really is, and so the only way to talk about it is through an agent’s behaviors. The point of this post is that SOTA LLM are probably farther in the spectrum than most people expect (My impression from experience is that GPT4 is ~75% of the way between total stochastic parrot and human). It is better than human in some task (some specific ToM experience like the example in argument 2), but still less good in others (like at applying nuances. It can understand them, but when you want it to actually be nuanced when it acts, you only see the difference when you ask for different stuff). I think it is important to build a measure for stochastic parrot ness as this might be an useful metric for governance and a better proxy for “does it understand the world it is in?” (which I think is important for most of the realistic doom scenarios). Also, these experiences are a way to give a taste of what LLM psychology look like.
Given that in the limit (infinite data and infinite parameters in the model) LLM’s are world simulators with tiny simulated humans inside writing text on the internet, the pressure applied to that simulated human is not understanding our world, but understanding that simulated world and be an agent inside that world. Which I think gives some hope.
Of course real world LLM’s are far from that limit, and we have no idea which path to that limit gradient descent takes. Eliezer famously argued about whole “simulator vs predictor” stuff which I think relevant to that intermidiate state far from limit.
Also RLHF applies additional weird pressures, for example a pressure to be aware that it’s an AI (or at least pretend that it’s aware, whatever that might mean), which makes fine-tuned LLM’s actually less save than raw ones.