under the assumptions here (including Chinchilla scaling laws), depth wouldn’t increase by more than about 3x before the utilization rate starts dropping (because depth would increase with exponent about 1⁄6 of the total increase in FLOP); which seems like great news for the legibility of CoT outputs and similar and vs. opaque reasoning in models: https://lesswrong.com/posts/HmQGHGCnvmpCNDBjc/current-ais-provide-nearly-no-data-relevant-to-agi-alignment#mcA57W6YK6a2TGaE2
under the assumptions here (including Chinchilla scaling laws), depth wouldn’t increase by more than about 3x before the utilization rate starts dropping (because depth would increase with exponent about 1⁄6 of the total increase in FLOP); which seems like great news for the legibility of CoT outputs and similar and vs. opaque reasoning in models: https://lesswrong.com/posts/HmQGHGCnvmpCNDBjc/current-ais-provide-nearly-no-data-relevant-to-agi-alignment#mcA57W6YK6a2TGaE2