In terms of speeding up AI development, not building anything > building something and keeping it completely secret > building something that your competitors learn about > building something and generating public hype about it via demos > building something with hype and publicly releasing it to users & customers.
I think it is very helpful, and healthy for the discourse, to make this distinction. I agree that many of these things might get lumped together.
But also, I want to flag the possibility that something can be very very bad to do, even if there are there other things that would have been progressively worse to do.
I want to make sure that groups get the credit that is due to them when they do good things against their incentives.
I also want to avoided falling into a pattern of thinking “well they didn’t do the worst thing, or the second worst thing, so that’s pretty good!” if in isolation I would have thought that action was pretty bad / blameworthy.
As of this moment, I don’t have a particular opinion one way or the other about how good or bad Anthropic’s release policy is. I’m merely making the abstract point at this time.
My guess is that training cutting edge models, and not releasing them is a pretty good play, or would have been, if there wasn’t huge AGI hype.
As it is, information about your models is going to leak, and in most cases the fact that something is possible is most of the secret to reverse engineering it (note: this might be true in the regime of transformer models, but it might not be true for other tasks or sub-problems).
But on the other hand, given the hype, people are going to try to do the things that you’re doing anyway, so maybe leaks about your capabilities don’t make that much difference?
This does point out an important consideration, which is “how much information needs to leak from your lab to enable someone else to replicate your results?”
It seems like, in many cases, there’s an obvious way to do some task, and the mere fact that you succeeded is enough info to recreate your result. But presumably there are cases, where you figure out a clever trick, and even if the evidence of your model’s performance leaks, that doesn’t tell the world how to do it (though it does cause maybe hundreds of smart people to start looking for how you did it, trying to discover how to do it themselves).
I think I should regard the situation differently depending on the status of that axis.
I think it is very helpful, and healthy for the discourse, to make this distinction. I agree that many of these things might get lumped together.
But also, I want to flag the possibility that something can be very very bad to do, even if there are there other things that would have been progressively worse to do.
I want to make sure that groups get the credit that is due to them when they do good things against their incentives.
I also want to avoided falling into a pattern of thinking “well they didn’t do the worst thing, or the second worst thing, so that’s pretty good!” if in isolation I would have thought that action was pretty bad / blameworthy.
As of this moment, I don’t have a particular opinion one way or the other about how good or bad Anthropic’s release policy is. I’m merely making the abstract point at this time.
Yeah, I agree with all of this, seems worth saying. Now to figure out the object level… 🤔
That’s the hard part.
My guess is that training cutting edge models, and not releasing them is a pretty good play, or would have been, if there wasn’t huge AGI hype.
As it is, information about your models is going to leak, and in most cases the fact that something is possible is most of the secret to reverse engineering it (note: this might be true in the regime of transformer models, but it might not be true for other tasks or sub-problems).
But on the other hand, given the hype, people are going to try to do the things that you’re doing anyway, so maybe leaks about your capabilities don’t make that much difference?
This does point out an important consideration, which is “how much information needs to leak from your lab to enable someone else to replicate your results?”
It seems like, in many cases, there’s an obvious way to do some task, and the mere fact that you succeeded is enough info to recreate your result. But presumably there are cases, where you figure out a clever trick, and even if the evidence of your model’s performance leaks, that doesn’t tell the world how to do it (though it does cause maybe hundreds of smart people to start looking for how you did it, trying to discover how to do it themselves).
I think I should regard the situation differently depending on the status of that axis.