It seems here that you are really worried about ‘foom in danger’ (danger per intelligence, D / I) than regular foom (4+ OOM increase in I), if I am reading you correctly. Like I don’t see a technical argument that eg. the claims in OP about any of
I/flop, flop/J, total(J), flop/$, or total($)
are wrong, you are just saying that ‘D / I will foom at some point’ (aka a model becomes much more dangerous quickly, without needing to be vastly more powerful algorithmically or having much more compute).
This doesn’t change things much but I just want to understand better what you mean when you say ‘foom’.
I don’t think I should clarify further right now, though I could potentially be convinced otherwise. I’d need to think about precisely what I want to highlight. It’s not like it’ll be that long before it becomes glaringly obvious, but I don’t currently see a reason why clarifying this particular aspect makes us safer.
It seems here that you are really worried about ‘foom in danger’ (danger per intelligence, D / I) than regular foom (4+ OOM increase in I), if I am reading you correctly. Like I don’t see a technical argument that eg. the claims in OP about any of
are wrong, you are just saying that ‘D / I will foom at some point’ (aka a model becomes much more dangerous quickly, without needing to be vastly more powerful algorithmically or having much more compute).
This doesn’t change things much but I just want to understand better what you mean when you say ‘foom’.
I don’t think I should clarify further right now, though I could potentially be convinced otherwise. I’d need to think about precisely what I want to highlight. It’s not like it’ll be that long before it becomes glaringly obvious, but I don’t currently see a reason why clarifying this particular aspect makes us safer.
Thats fair however, I would say that the manner of foom determines a lot about what to look out for and where to put safeguards.
If it’s total($) thats obvious how to look out.
flop/$ also seems like something that eg. NVIDIA is tracking closely, and per OP probably can’t foom too rapidly absent nanotech.
So the argument is something about the (D*I)/flop dynamics.
[redacted] I wrote more here but probably its best left unsaid for now. I think we are on a similar enough page.