If GPT-4 was smart enough to manipulate ARC employees into manipulating TaskRabbits into helping it escape… it already had been talking to thousands of much less cautious employees of various other companies (including people at OpenAI who actually had access to the weights unlike ARC) for much longer, so it already would have escaped.
What ARC did is the equivalent of tasting it in a BSL4 lab. I mean, their security could probably still be improved, but again I emphasize that the thing was set to be released in a few weeks anyway and would have been released unless ARC found something super dangerous in this test. And I’m sure they will further improve their security as they scale up as an org and as models become more dangerous.
The taskrabbit stuff was not unethical, their self-replication experiment was not borderline suicidal. As for precedents, what ARC is doing is a great precedent because currently the alternative is not to test this sort of thing at all before deploying.
“ What ARC did is the equivalent of tasting it in a BSL4 lab. ”
I don’t see how you could believe that. It wasn’t tested on a completely airgapped machine inside a faraday cage e.g. I’m fact just the opposite right, with uninformed humans and on cloud servers.
It’s all relative. ARCs security was way stronger than the security GPT-4 had before and after ARC’s evals. So for GPT-4, beginning ARC testing was like a virus moving from a wet market somewhere to a BSL-4 lab, in terms of relative level of oversight/security/etc. I agree that ARCs security could still be improved—and they fully intend to do so.
If GPT-4 was smart enough to manipulate ARC employees into manipulating TaskRabbits into helping it escape… it already had been talking to thousands of much less cautious employees of various other companies (including people at OpenAI who actually had access to the weights unlike ARC) for much longer, so it already would have escaped.
What ARC did is the equivalent of tasting it in a BSL4 lab. I mean, their security could probably still be improved, but again I emphasize that the thing was set to be released in a few weeks anyway and would have been released unless ARC found something super dangerous in this test. And I’m sure they will further improve their security as they scale up as an org and as models become more dangerous.
The taskrabbit stuff was not unethical, their self-replication experiment was not borderline suicidal. As for precedents, what ARC is doing is a great precedent because currently the alternative is not to test this sort of thing at all before deploying.
“ What ARC did is the equivalent of tasting it in a BSL4 lab. ”
I don’t see how you could believe that. It wasn’t tested on a completely airgapped machine inside a faraday cage e.g. I’m fact just the opposite right, with uninformed humans and on cloud servers.
It’s all relative. ARCs security was way stronger than the security GPT-4 had before and after ARC’s evals. So for GPT-4, beginning ARC testing was like a virus moving from a wet market somewhere to a BSL-4 lab, in terms of relative level of oversight/security/etc. I agree that ARCs security could still be improved—and they fully intend to do so.