One thing I’d like to see more of: attempts at voluntary compliance with proposed plans, and libraries and tools to support that.
I’ve seen suggestions to limit the compute power used on large training runs. Sounds great; might or might not be the answer, but if folks want to give it a try, let’s help them. Where are the libraries that make it super easy to report the compute power used on a training run? To show a Merkle tree of what other models or input data that training run depends on? (Or, if extinction risk isn’t your highest priority, to report which media by which people got incorporated, and what licenses it was used under?) How do those libraries support reporting by open-source efforts, and incremental reporting?
What if the plan is alarm bells and shutdowns of concerning training runs? Or you’re worried about model exfiltration by spies or rogue employees? Are there tools that make it easy to report what steps you’re taking to prevent that? That make it easy to provide good security against those threat models? Where’s the best practices guide?
We don’t have a complete answer. But we have some partial answers, or steps that might move in the right direction. And right now actually taking those next steps, for marginal people kinda on the fence about how to trade capabilities progress against security and alignment work, looks like it’s hard. Or at least harder than I can imagine it being.
I think neither. Or rather, I support it, but that’s not quite what I had in mind with the above comment, unless there’s specific stuff they’re doing that I’m not aware of. (Which is entirely possible; I’m following this work only loosely, and not in detail. If I’m missing something, I would be very grateful for more specific links to stuff I should be reading. Git links to usable software packages would be great.)
What I’m looking for mostly, at the moment, is software tools that could be put to use. A library, a tutorial, a guide for how to incorporate that library into your training run, and a result of better compliance with voluntary reporting. What I’ve seen so far is mostly high-effort investigative reports and red-teaming efforts.
Best practices around how to evaluate models and high-effort things you can do while making them are also great. But I’m specifically looking for tools that enable low effort compliance and reporting options while people are doing the same stuff they otherwise would be. I think that would complement the suggestions for high-effort best practices.
The output I’d like to see is things like machine-parseable quantification of flops used to generate a model, such that a derivative model would specify both total and marginal flops used to create it.
I’m pretty confident the primary labs keep track of the number of flops used to train their models. I also don’t know how such a tool would prevent us all from dying.
I don’t know how it prevents us from dying either! I don’t have a plan that accomplishes that; I don’t think anyone else does either. If I did, I promise I’d be trying to explain it.
That said, I think there are pieces of plans that might help buy time, or might combine with other pieces to do something more useful. For example, we could implement regulations that take effect above a certain model size or training effort. Or that prevent putting too many flops worth of compute in one tightly-coupled cluster.
One problem with implementing those regulations is that there’s disagreement about whether they would help. But that’s not the only problem. Other problems are things like: how hard would they be to comply with and audit compliance with? Is compliance even possible in an open-source setting? Will those open questions get used as excuses to oppose them by people who actually object for other reasons?
And then there’s the policy question of how we move from the no-regulations world of today to a world with useful regulations, assuming that’s a useful move. So the question I’m trying to attack is: what’s the next step in that plan? Maybe we don’t know because we don’t know what the complete plan is or whether the later steps can work at all, but are there things that look likely to be useful next steps that we can implement today?
One set of answers to that starts with voluntary compliance. Signing an open letter creates common knowledge that people think there’s a problem. Widespread voluntary compliance provides common knowledge that people agree on a next step. But before the former can happen, someone has to write the letter and circulate it and coordinate getting signatures. And before the latter can happen, someone has to write the tools.
So a solutionism-focused approach, as called for by the post I’m replying to, is to ask what the next step is. And when the answer isn’t yet actionable, break that down further until it is. My suggestion was intended to be one small step of many, that I haven’t seen discussed much as a useful next step.
One thing I’d like to see more of: attempts at voluntary compliance with proposed plans, and libraries and tools to support that.
I’ve seen suggestions to limit the compute power used on large training runs. Sounds great; might or might not be the answer, but if folks want to give it a try, let’s help them. Where are the libraries that make it super easy to report the compute power used on a training run? To show a Merkle tree of what other models or input data that training run depends on? (Or, if extinction risk isn’t your highest priority, to report which media by which people got incorporated, and what licenses it was used under?) How do those libraries support reporting by open-source efforts, and incremental reporting?
What if the plan is alarm bells and shutdowns of concerning training runs? Or you’re worried about model exfiltration by spies or rogue employees? Are there tools that make it easy to report what steps you’re taking to prevent that? That make it easy to provide good security against those threat models? Where’s the best practices guide?
We don’t have a complete answer. But we have some partial answers, or steps that might move in the right direction. And right now actually taking those next steps, for marginal people kinda on the fence about how to trade capabilities progress against security and alignment work, looks like it’s hard. Or at least harder than I can imagine it being.
(On a related note, I think the intersection of security and alignment is a fruitful area to apply more effort.)
Do you disagree with Apollo or ARC evals’s approaches to the voluntary compliance solutions?
I think neither. Or rather, I support it, but that’s not quite what I had in mind with the above comment, unless there’s specific stuff they’re doing that I’m not aware of. (Which is entirely possible; I’m following this work only loosely, and not in detail. If I’m missing something, I would be very grateful for more specific links to stuff I should be reading. Git links to usable software packages would be great.)
What I’m looking for mostly, at the moment, is software tools that could be put to use. A library, a tutorial, a guide for how to incorporate that library into your training run, and a result of better compliance with voluntary reporting. What I’ve seen so far is mostly high-effort investigative reports and red-teaming efforts.
Best practices around how to evaluate models and high-effort things you can do while making them are also great. But I’m specifically looking for tools that enable low effort compliance and reporting options while people are doing the same stuff they otherwise would be. I think that would complement the suggestions for high-effort best practices.
The output I’d like to see is things like machine-parseable quantification of flops used to generate a model, such that a derivative model would specify both total and marginal flops used to create it.
I’m pretty confident the primary labs keep track of the number of flops used to train their models. I also don’t know how such a tool would prevent us all from dying.
I don’t know how it prevents us from dying either! I don’t have a plan that accomplishes that; I don’t think anyone else does either. If I did, I promise I’d be trying to explain it.
That said, I think there are pieces of plans that might help buy time, or might combine with other pieces to do something more useful. For example, we could implement regulations that take effect above a certain model size or training effort. Or that prevent putting too many flops worth of compute in one tightly-coupled cluster.
One problem with implementing those regulations is that there’s disagreement about whether they would help. But that’s not the only problem. Other problems are things like: how hard would they be to comply with and audit compliance with? Is compliance even possible in an open-source setting? Will those open questions get used as excuses to oppose them by people who actually object for other reasons?
And then there’s the policy question of how we move from the no-regulations world of today to a world with useful regulations, assuming that’s a useful move. So the question I’m trying to attack is: what’s the next step in that plan? Maybe we don’t know because we don’t know what the complete plan is or whether the later steps can work at all, but are there things that look likely to be useful next steps that we can implement today?
One set of answers to that starts with voluntary compliance. Signing an open letter creates common knowledge that people think there’s a problem. Widespread voluntary compliance provides common knowledge that people agree on a next step. But before the former can happen, someone has to write the letter and circulate it and coordinate getting signatures. And before the latter can happen, someone has to write the tools.
So a solutionism-focused approach, as called for by the post I’m replying to, is to ask what the next step is. And when the answer isn’t yet actionable, break that down further until it is. My suggestion was intended to be one small step of many, that I haven’t seen discussed much as a useful next step.