I think whether the additional complexity is mundane or not depends on how you’re producing the agent. Humans can scale up human-designed engineering products fairly easily, because we have a high-level understanding of how the components all fit together. But if you have a big neural net whose internal composition is mostly determined by the optimiser, then it’s much less clear to me. There are some scaling operations which are conceptually very easy for humans, and also hard to do via gradient descent. As a simple example, in a big neural network where the left half is doing subcomputation X and the right half is doing subcomputation Y, it’d be very laborious for the optimiser to swap it so the left half is doing Y and the right half is doing X—since the optimiser can only change the network gradually, and after each gradient update the whole thing needs to still work. This may be true even if swapping X and Y is a crucial step towards scaling up the whole system, which will later allow much better performance.
In other words, we’re biased towards thinking that scaling is “mundane” because human-designed systems scale easily (and to some extent, because evolution-designed systems also scale easily). It’s not clear that AIs also have this property; there’s a whole lot of retraining involved in going from a small network to a bigger network (and in fact usually the bigger network is trained from scratch rather than starting from a scaled-up version of the small one).
I think whether the additional complexity is mundane or not depends on how you’re producing the agent. Humans can scale up human-designed engineering products fairly easily, because we have a high-level understanding of how the components all fit together. But if you have a big neural net whose internal composition is mostly determined by the optimiser, then it’s much less clear to me. There are some scaling operations which are conceptually very easy for humans, and also hard to do via gradient descent. As a simple example, in a big neural network where the left half is doing subcomputation X and the right half is doing subcomputation Y, it’d be very laborious for the optimiser to swap it so the left half is doing Y and the right half is doing X—since the optimiser can only change the network gradually, and after each gradient update the whole thing needs to still work. This may be true even if swapping X and Y is a crucial step towards scaling up the whole system, which will later allow much better performance.
In other words, we’re biased towards thinking that scaling is “mundane” because human-designed systems scale easily (and to some extent, because evolution-designed systems also scale easily). It’s not clear that AIs also have this property; there’s a whole lot of retraining involved in going from a small network to a bigger network (and in fact usually the bigger network is trained from scratch rather than starting from a scaled-up version of the small one).