My guess is that this comment would be much more readable with the central chunk of it in a google doc, or failing that a few levels fewer of indented bullets.
e.g. Take this section.
Thoughts:
Do those numbers add up? It seems like if you’re worried about models that were trained on 10^26 flops in total, you should be worried about much smaller training speed thresholds than 10^20 flops per second? 10^19 flops per second, would allow you to train a 10^26 model in 115 days, e.g. about 4 months. Those standards don’t seem consistent.
What do I think about this overall?
I mean, I guess reporting this stuff to the government is a good stepping stone for more radical action, but it depends on what the government decides to do with the reported info.
The thresholds match those that I’ve seen in strategy documents of people that I respect, so that that seems promising. My understanding is that 10^26 FLOPS is about 1-2 orders of magnitude larger than our current biggest models.
The interest in red-teaming is promising, but again it depends on the implementation details.
I’m very curious about “launching an initiative to create guidance and benchmarks for evaluating and auditing AI capabilities, with a focus on capabilities through which AI could cause harm, such as in the areas of cybersecurity and biosecurity.”
What will concretely happen in the world as a result of “an initiative”? Does that mean allocating funding to orgs doing this kind of work? Does it mean setting up some kind of government agency like NIST to…invent benchmarks?
I find it much more readable as the following prose rather than 5 levels of bullets. Less metacognition tracking the depth.
Thoughts
Do those numbers add up? It seems like if you’re worried about models that were trained on 10^26 flops in total, you should be worried about much smaller training speed thresholds than 10^20 flops per second? 10^19 flops per second, would allow you to train a 10^26 model in 115 days, e.g. about 4 months. Those standards don’t seem consistent.
What do I think about this overall?
I mean, I guess reporting this stuff to the government is a good stepping stone for more radical action, but it depends on what the government decides to do with the reported info.
The thresholds match those that I’ve seen in strategy documents of people that I respect, so that that seems promising. My understanding is that 10^26 FLOPS is about 1-2 orders of magnitude larger than our current biggest models.
The interest in red-teaming is promising, but again it depends on the implementation details.
I’m very curious about “launching an initiative to create guidance and benchmarks for evaluating and auditing AI capabilities, with a focus on capabilities through which AI could cause harm, such as in the areas of cybersecurity and biosecurity.”
What will concretely happen in the world as a result of “an initiative”? Does that mean allocating funding to orgs doing this kind of work? Does it mean setting up some kind of government agency like NIST to…invent benchmarks?
Possibly. I wrote this as personal notes, originally, in full nested list format. Then I spent 20 minutes removing some of the the nested-list-ness in wordpress which was very frustrating. I would definitely have organized it better if wordpress was less frustrating.
I did make a google doc format. Maybe the main lesson is that I should have edited it there.
My guess is that this comment would be much more readable with the central chunk of it in a google doc, or failing that a few levels fewer of indented bullets.
e.g. Take this section.
I find it much more readable as the following prose rather than 5 levels of bullets. Less metacognition tracking the depth.
Possibly. I wrote this as personal notes, originally, in full nested list format. Then I spent 20 minutes removing some of the the nested-list-ness in wordpress which was very frustrating. I would definitely have organized it better if wordpress was less frustrating.
I did make a google doc format. Maybe the main lesson is that I should have edited it there.