No, that’s not the question I was asking. Humans are able to start using grammatical languages on the basis of no observations of grammatical language whatsoever—not in the pretraining, not in the training, not in text form, not in audio form, not in video form. Again, I mentioned Nicaraguan sign language, or the creation of creoles from pidgins, or for that matter in the original creation of language by hominins.
So this has nothing to do with sample-efficiency. There are zero samples.
I don’t think you can take one or more randomly-initialized transformers, and get grammatical language out of them, without ever putting any human-created grammatical language into them. Do you? If so, how?
I agree that my statements about sample efficiency do not address this point. I do think you could get transformers to invent language, without seeing language data. You would want to use online learning in an observation, state, action loop while interacting with an environment, and probably include optimizations from ReAct, Reflexion, AutoGPT, and Voyager. But each of these relies on having some core language model that can do reasoning, and the way that we normally get these is by pre-training on language. I could imagine instead on pre-training on solutions to another problem that is arbitrarily hard to compute, simple to verify, and provides a natural learning gradient. For example, the LM could be given a numpy program f and an output f(x) and get loss L2(f(x),f(y)) for guess y. Or it could try to guess zeros of polynomials and get loss and be penalized according to the guess squared. Then put the agents together in a way such that they can communicate through their input and output channels, and I suspect that they will be able to create language. Maybe language is not so hard—level 1 is just using words to point at concepts you already have. Then learning how to compose those words is just a matter of more time-steps, given sufficient parameter capacity in your networks.
To say this you would have to argue that humans without this feature would have led a faster singularity, more or less.
I am saying it is hard to know if a feature of a person gives rise to better communication in the whole group, which makes my theory conveniently hard to test. And then I am pointing at the singularity as a limiting object (from our point of view) of increasing communication, that follows in a trend after DNA, language, the printing press, phones, the internet, and AI.
Your post says “Let’s imagine a hypothetical scenario where an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways.” OK, now:
It is possible in principle to program an AI that is exactly like a human sociopath’s brain
It is possible in principle to put that AI in a human-like body and raise it in a loving human family in a normal human neighborhood, enroll them in school, etc.
Presumably, if I did both these things, this would be a central example of “a hypothetical scenario where an AI is somehow trained in a way that is analogous to a human childhood in all of the relevant ways”, according to a reasonable interpretation of those words.
And if I did both these things, I would wind up creating an AI that is just like a human adult high-functioning sociopath, the kind of person that emotionally abuses people just for fun, with callous disregard for the well-being of anyone but themselves, that is constitutionally incapable of guilt or remorse, etc. etc.
Where if anywhere do you disagree?
For the bullets:
Agree, and I think that AI won’t last long in the world, but it might last long enough to destroy humans.
Agree
Agree
Thank you for bringing my post into an empirical domain I had not been thinking about. So I will modify my claim to ‘there exists a competence level α such that for all agents with competence level β>=α, nurture matters more than nature’, where ‘matters more than’ also needs to be made precise. Now the question is locating α, for which it would be useful for me to understand how common it is for a person to have a high quality upbringing (in a multi-faceted sense) and end up self-interested. Though I wonder if size of moral circle is the right metric.
I think people’s personalities are significantly predictable from their genes, and mostly independent of how their parents raised them (at least within the typical distribution, i.e. leaving aside cases of flagrant abuse and neglect etc.). See e.g. popular expositions of this theory by Judith Harris or by Bryan Caplan for the fine print and massive body of supporting evidence (e.g. twin studies and adoption studies). Antisocial personality disorder / sociopathy follows the usual pattern like everything else—it’s substantially predictable based on genes, almost entirely independent of how your parents raise you and other aspects of childhood family environment.
I’m not sure what you mean by “competence”. Mean people and cruel people and high-functioning sociopaths can be very highly “competent” according to how I use that word day-to-day. William Shockley was a brilliant physicist who started a successful company—while also being awful to everyone, vindictive, and a notorious racist. Heck, Hitler himself was extraordinarily charismatic and exquisitely skilled at social manipulation, AFAICT. He achieved one wildly ambitious goal after another. I think I would describe him as a “highly competent” guy.
I agree that my statements about sample efficiency do not address this point. I do think you could get transformers to invent language, without seeing language data. You would want to use online learning in an observation, state, action loop while interacting with an environment, and probably include optimizations from ReAct, Reflexion, AutoGPT, and Voyager. But each of these relies on having some core language model that can do reasoning, and the way that we normally get these is by pre-training on language. I could imagine instead on pre-training on solutions to another problem that is arbitrarily hard to compute, simple to verify, and provides a natural learning gradient. For example, the LM could be given a numpy program f and an output f(x) and get loss L2(f(x),f(y)) for guess y. Or it could try to guess zeros of polynomials and get loss and be penalized according to the guess squared. Then put the agents together in a way such that they can communicate through their input and output channels, and I suspect that they will be able to create language. Maybe language is not so hard—level 1 is just using words to point at concepts you already have. Then learning how to compose those words is just a matter of more time-steps, given sufficient parameter capacity in your networks.
I am saying it is hard to know if a feature of a person gives rise to better communication in the whole group, which makes my theory conveniently hard to test. And then I am pointing at the singularity as a limiting object (from our point of view) of increasing communication, that follows in a trend after DNA, language, the printing press, phones, the internet, and AI.
For the bullets:
Agree, and I think that AI won’t last long in the world, but it might last long enough to destroy humans.
Agree
Agree
Thank you for bringing my post into an empirical domain I had not been thinking about. So I will modify my claim to ‘there exists a competence level α such that for all agents with competence level β>=α, nurture matters more than nature’, where ‘matters more than’ also needs to be made precise. Now the question is locating α, for which it would be useful for me to understand how common it is for a person to have a high quality upbringing (in a multi-faceted sense) and end up self-interested. Though I wonder if size of moral circle is the right metric.
Thanks!
I think people’s personalities are significantly predictable from their genes, and mostly independent of how their parents raised them (at least within the typical distribution, i.e. leaving aside cases of flagrant abuse and neglect etc.). See e.g. popular expositions of this theory by Judith Harris or by Bryan Caplan for the fine print and massive body of supporting evidence (e.g. twin studies and adoption studies). Antisocial personality disorder / sociopathy follows the usual pattern like everything else—it’s substantially predictable based on genes, almost entirely independent of how your parents raise you and other aspects of childhood family environment.
I’m not sure what you mean by “competence”. Mean people and cruel people and high-functioning sociopaths can be very highly “competent” according to how I use that word day-to-day. William Shockley was a brilliant physicist who started a successful company—while also being awful to everyone, vindictive, and a notorious racist. Heck, Hitler himself was extraordinarily charismatic and exquisitely skilled at social manipulation, AFAICT. He achieved one wildly ambitious goal after another. I think I would describe him as a “highly competent” guy.