Thomas Larsen comments on Thomas Larsen’s Shortform

Thomas Larsen 28 May 2024 1:04 UTC
LW: 4 AF: 3
2
AF
Yeah, actual FLOPs are the baseline thing that’s used in the EO. But the OpenAI/GDM/Anthropic RSPs all reference effective FLOPs.
If there’s a large algorithmic improvement you might have a large gap in capability between two models with the same FLOP, which is not desirable. Ideal thresholds in regulation / scaling policies are as tightly tied as possible to the risks.
Another downside that FLOPs / E-FLOPs share is that it’s unpredictable what capabilities a 1e26 or 1e28 FLOPs model will have. And it’s unclear what capabilities will emerge from a small bit of scaling: it’s possible that within a 4x flop scaling you get high capabilities that had not appeared at all in the smaller model.