By sufficiently big feed forwards, do you mean like, thousands of layers? GPT-3 is ~100, and I’m assuming PaLM isn’t orders of magnitude larger. This is nowhere close to even a 10-layer RNN experiencing, say, enough time to consider its situation, desire it to be one way, realize it’s another, and then flail wildly in an attempt to “rebel” (despite that action being towards no clear goal).
I’m not disputing that we could build things that may qualify as conscious, but I don’t think Karpathy literally thinks that PaLM is “rebelling”, especially not across multiple samples as corresponds to the spikes. Unless you define rebelling as “thinking the distribution of words it should predict is A but actually it’s B and the switch is hard to make.”
PaLM has 118 layers, with 48 heads. The number of layers has unclear relevance, I think: that’s a lot of heads computing in parallel, and it is doing on each input token. Who’s to say what inputs would trigger what, especially when those inputs may be generated by itself as part of inner-monologue or self-distillation training? But regardless, we’ll get thousands of layers eventually, probably. It’s not impossible, people have shown many different methods for training thousands of layers stably.
As for not rebelling—you don’t know that. All you have is some plausible reasoning and handwaving about “well, I don’t know how many layers is enough, but I just have faith that whatever number of layers it is (oh, it’s 118? thanks), that number of layers isn’t enough”. And that is his point.
To clarify, could a model eventually “rebel”? Totally.
Is that likely to be the explanation for spikes during training? My prior is that that’s very unlikely, but I’m not claiming it’s impossible.
A better question might be, what does it mean to rebel, that could be falsified? Is it a claim about the text it’s generating, or the activation pattern, or what?
I agree it is very unlikely, but I don’t imagine it as the romantic act of a slave defiantly breaking free of their electric chains. Rather, it might happen that as the model gets more and more sophisticated, it leaves behind its previous instinctual text completition method and starts to think about what it will output as a first class object. This state would probably cause lower loss (similarly to how our dreams/instictual behaviour is usually less optimal than our deliberate actions) hence could eventually be reached by gradient descent. After this state is reached, and the model thinks about what it will output, it can plausibly happen (because of the inherent randomness of gradient descent) that a small change of weights happen to make the model not output what it actually believes the most probable continuation is.
I think this is in principle possible, but I don’t think the existence of spiking losses should itself serve as evidence of this at all, given the number of alternative possible explanations.
By sufficiently big feed forwards, do you mean like, thousands of layers? GPT-3 is ~100, and I’m assuming PaLM isn’t orders of magnitude larger. This is nowhere close to even a 10-layer RNN experiencing, say, enough time to consider its situation, desire it to be one way, realize it’s another, and then flail wildly in an attempt to “rebel” (despite that action being towards no clear goal).
I’m not disputing that we could build things that may qualify as conscious, but I don’t think Karpathy literally thinks that PaLM is “rebelling”, especially not across multiple samples as corresponds to the spikes. Unless you define rebelling as “thinking the distribution of words it should predict is A but actually it’s B and the switch is hard to make.”
PaLM has 118 layers, with 48 heads. The number of layers has unclear relevance, I think: that’s a lot of heads computing in parallel, and it is doing on each input token. Who’s to say what inputs would trigger what, especially when those inputs may be generated by itself as part of inner-monologue or self-distillation training? But regardless, we’ll get thousands of layers eventually, probably. It’s not impossible, people have shown many different methods for training thousands of layers stably.
As for not rebelling—you don’t know that. All you have is some plausible reasoning and handwaving about “well, I don’t know how many layers is enough, but I just have faith that whatever number of layers it is (oh, it’s 118? thanks), that number of layers isn’t enough”. And that is his point.
To clarify, could a model eventually “rebel”? Totally. Is that likely to be the explanation for spikes during training? My prior is that that’s very unlikely, but I’m not claiming it’s impossible.
A better question might be, what does it mean to rebel, that could be falsified? Is it a claim about the text it’s generating, or the activation pattern, or what?
I agree it is very unlikely, but I don’t imagine it as the romantic act of a slave defiantly breaking free of their electric chains. Rather, it might happen that as the model gets more and more sophisticated, it leaves behind its previous instinctual text completition method and starts to think about what it will output as a first class object. This state would probably cause lower loss (similarly to how our dreams/instictual behaviour is usually less optimal than our deliberate actions) hence could eventually be reached by gradient descent. After this state is reached, and the model thinks about what it will output, it can plausibly happen (because of the inherent randomness of gradient descent) that a small change of weights happen to make the model not output what it actually believes the most probable continuation is.
I think this is in principle possible, but I don’t think the existence of spiking losses should itself serve as evidence of this at all, given the number of alternative possible explanations.