Oh, you are talking about the taskrabbit people? So you’d be fine with it if they didn’t use taskrabbits?
Note that the model wasn’t given unfettered access to the taskrabbits, the model sent text to an ARC employee who sent it to the taskrabbit and so forth. At no point could anything actually bad have happened because the ARC employee involved wouldn’t have passed on the relevant message.
As for extremely dangerous… what are you imagining? I’m someone who thinks the chance of an AI-induced existential catastrophe is around 80%, so believe me I’m very aware of the dangers of AI, but I’d assign far far far less than 1% chance to scenarios in which this happens specifically due to an ARC test going awry. And more than 1% chance to scenarios in which ARC’s testing literally saves the world, e.g. by providing advance warning that models are getting scarily powerful, resulting in labs slowing down and being more careful instead of speeding up and deploying.
This is a bizarre comment. Isn’t a crucial point in these discussions that humans can’t really understand an AGIs plans so how is it that you expect an ARC employee would be able to accurately determine which messages sent to TaskRabbit would actually be dangerous? We’re bordering on “they’d just shut the AI off if it was dangerous” territory here. I’m less concerned about the TaskRabbit stuff which at minimum was probably unethical, but their self replication experiment on a cloud service strikes me as borderline suicidal. I don’t think at all that GPT4 is actually dangerous but GPT6 might be and I would expect that running this test on an actually dangerous system would be game over so it’s a terrible precedent to set.
Imagine someone discovered a new strain of Ebola and wanted to see if it was likely to spawn a pandemic. Do you think a good/safe test would be to take it into an Airport and spray it around baggage check and wait to see if a pandemic happens? Or would it be safer to test it in a Biosafety level 4 lab?
If GPT-4 was smart enough to manipulate ARC employees into manipulating TaskRabbits into helping it escape… it already had been talking to thousands of much less cautious employees of various other companies (including people at OpenAI who actually had access to the weights unlike ARC) for much longer, so it already would have escaped.
What ARC did is the equivalent of tasting it in a BSL4 lab. I mean, their security could probably still be improved, but again I emphasize that the thing was set to be released in a few weeks anyway and would have been released unless ARC found something super dangerous in this test. And I’m sure they will further improve their security as they scale up as an org and as models become more dangerous.
The taskrabbit stuff was not unethical, their self-replication experiment was not borderline suicidal. As for precedents, what ARC is doing is a great precedent because currently the alternative is not to test this sort of thing at all before deploying.
“ What ARC did is the equivalent of tasting it in a BSL4 lab. ”
I don’t see how you could believe that. It wasn’t tested on a completely airgapped machine inside a faraday cage e.g. I’m fact just the opposite right, with uninformed humans and on cloud servers.
It’s all relative. ARCs security was way stronger than the security GPT-4 had before and after ARC’s evals. So for GPT-4, beginning ARC testing was like a virus moving from a wet market somewhere to a BSL-4 lab, in terms of relative level of oversight/security/etc. I agree that ARCs security could still be improved—and they fully intend to do so.
Oh, you are talking about the taskrabbit people? So you’d be fine with it if they didn’t use taskrabbits?
Note that the model wasn’t given unfettered access to the taskrabbits, the model sent text to an ARC employee who sent it to the taskrabbit and so forth. At no point could anything actually bad have happened because the ARC employee involved wouldn’t have passed on the relevant message.
As for extremely dangerous… what are you imagining? I’m someone who thinks the chance of an AI-induced existential catastrophe is around 80%, so believe me I’m very aware of the dangers of AI, but I’d assign far far far less than 1% chance to scenarios in which this happens specifically due to an ARC test going awry. And more than 1% chance to scenarios in which ARC’s testing literally saves the world, e.g. by providing advance warning that models are getting scarily powerful, resulting in labs slowing down and being more careful instead of speeding up and deploying.
This is a bizarre comment. Isn’t a crucial point in these discussions that humans can’t really understand an AGIs plans so how is it that you expect an ARC employee would be able to accurately determine which messages sent to TaskRabbit would actually be dangerous? We’re bordering on “they’d just shut the AI off if it was dangerous” territory here. I’m less concerned about the TaskRabbit stuff which at minimum was probably unethical, but their self replication experiment on a cloud service strikes me as borderline suicidal. I don’t think at all that GPT4 is actually dangerous but GPT6 might be and I would expect that running this test on an actually dangerous system would be game over so it’s a terrible precedent to set.
Imagine someone discovered a new strain of Ebola and wanted to see if it was likely to spawn a pandemic. Do you think a good/safe test would be to take it into an Airport and spray it around baggage check and wait to see if a pandemic happens? Or would it be safer to test it in a Biosafety level 4 lab?
If GPT-4 was smart enough to manipulate ARC employees into manipulating TaskRabbits into helping it escape… it already had been talking to thousands of much less cautious employees of various other companies (including people at OpenAI who actually had access to the weights unlike ARC) for much longer, so it already would have escaped.
What ARC did is the equivalent of tasting it in a BSL4 lab. I mean, their security could probably still be improved, but again I emphasize that the thing was set to be released in a few weeks anyway and would have been released unless ARC found something super dangerous in this test. And I’m sure they will further improve their security as they scale up as an org and as models become more dangerous.
The taskrabbit stuff was not unethical, their self-replication experiment was not borderline suicidal. As for precedents, what ARC is doing is a great precedent because currently the alternative is not to test this sort of thing at all before deploying.
“ What ARC did is the equivalent of tasting it in a BSL4 lab. ”
I don’t see how you could believe that. It wasn’t tested on a completely airgapped machine inside a faraday cage e.g. I’m fact just the opposite right, with uninformed humans and on cloud servers.
It’s all relative. ARCs security was way stronger than the security GPT-4 had before and after ARC’s evals. So for GPT-4, beginning ARC testing was like a virus moving from a wet market somewhere to a BSL-4 lab, in terms of relative level of oversight/security/etc. I agree that ARCs security could still be improved—and they fully intend to do so.