Moreover, there is a core difference between the growth of the cost of brain size between humans and AI (sublinear vs linear).
Actually, I was imagining that for humans the cost of brain size grows superlinearly. The paper you linked uses a quadratic function, and also tried an exponential and found similar results.
But in the world where AI dev faces hardware constraints, social learning will be much more useful.
Agreed if the AI uses social learning to learn from humans, but that only gets you to human-level AI. If you want to argue for something like fast takeoff to superintelligence, you need to talk about how the AI learns independently of humans, and in that setting social learning won’t be useful given linear costs.
E.g. Suppose that each unit of adaptive knowledge requires one unit of asocial learning. Every unit of learning costs $K, regardless of brain size, so that everything is linear. No matter how much social learning you have, the discovery of N units of knowledge is going to cost $KN, so the best thing you can do is put N units of asocial learning in a single brain/model so that you don’t have to pay any cost for social learning.
In contrast, if N units of asocial learning in a single brain costs $KN2, then having N units of asocial learning in a single brain/model is very expensive. You can instead have N separate brains each with 1 unit of asocial learning, for a total cost of $KN, and that is enough to discover the N units of knowledge. You can then invest a unit or two of social learning for each brain/model so that they can all accumulate the N units of knowledge, giving a total cost that is still linear in N.
I’m claiming that AI is more like the former while this paper’s model is more like the latter. Higher hardware constraints only changes the value of K, which doesn’t affect this analysis.
Actually, I was imagining that for humans the cost of brain size grows superlinearly. The paper you linked uses a quadratic function, and also tried an exponential and found similar results.
Agreed if the AI uses social learning to learn from humans, but that only gets you to human-level AI. If you want to argue for something like fast takeoff to superintelligence, you need to talk about how the AI learns independently of humans, and in that setting social learning won’t be useful given linear costs.
E.g. Suppose that each unit of adaptive knowledge requires one unit of asocial learning. Every unit of learning costs $K, regardless of brain size, so that everything is linear. No matter how much social learning you have, the discovery of N units of knowledge is going to cost $KN, so the best thing you can do is put N units of asocial learning in a single brain/model so that you don’t have to pay any cost for social learning.
In contrast, if N units of asocial learning in a single brain costs $KN2, then having N units of asocial learning in a single brain/model is very expensive. You can instead have N separate brains each with 1 unit of asocial learning, for a total cost of $KN, and that is enough to discover the N units of knowledge. You can then invest a unit or two of social learning for each brain/model so that they can all accumulate the N units of knowledge, giving a total cost that is still linear in N.
I’m claiming that AI is more like the former while this paper’s model is more like the latter. Higher hardware constraints only changes the value of K, which doesn’t affect this analysis.