habryka comments on ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

habryka 15 Mar 2023 7:54 UTC
23 points
2
For context, here are the top comments on the Reddit thread. I didn’t feel like really any of these were well-interpreted as “taking the ARC eval seriously”, so I am not super sure where this impression comes from. Maybe there were other comments that were upvoted when you read this? I haven’t found a single comment that seems to actually directly comment on what the ARC eval means (just some discussion about whether the model actually succeeded at deceiving a taskrabbit since the paper is quite confusing on this).
“Not an endorsement” is pro forma cover-my-ass boilerplate. Something like “recommends against deployment” would represent legitimate concerns. I don’t think there’s anything here.
Imagine if we did create an AI super virus… Maybe not this one, but eventually, the AI does break out into the wild, spreading through the networks using iterative natural selection to find optimal ways to preserve itself through coding safe guards. I can imagine a scenario where it finds ways to basically create a very fluid botnet of itself forever existing in the digital aether, constantly cloning and optimizing itself… Naturally selecting a more and more efficient form of itself to evade detection until it gets to the point that it sort of is some abstract life existing through all these networked computers, unable to be stopped.
I’m glad they’re testing things like this in a controlled setting. They found that GPT4 was ineffective at all the tasks mentioned like self replicating, avoiding deletion, and wealth creation.
Even without these concerns, I think this is all moving way too fast. I am equal parts fascinated/excited and horrified/frightened by GPT-4. It’s very disorienting.
I do feel a sense of nihilism seeping into my consciousness when I realize we can’t possibly compete with AI. Yes, we can leverage them to empower ourselves, but at the same time we devalue our own thinking, and the more we rely on AI, the more our thinking and ancillary skills like writing will atrophe and decline.
It does feel like being on the brink of the singularity, with that moment of hesitation, not knowing what we will find on the other side.
The irony of releasing this paper is that this will almost certainly become part of the training data set for the next model, especially now that it’s on Reddit which I believe is part of its dataset, which means if this thing ever does become sentient it knows what tests are being done to measure its sentience. These tests are now useless.
Not judging OP for posting here or anything of course. Just pointing out that the second the agency published their methods online they compromised the test.
A.I. is the predestined successor of mankind. Step aside monkey men, the hour of your extinction has come.
I really don’t get a sense of “if ARC had recommended against publishing these people would care” vibe from this.
- jaspax 15 Mar 2023 14:19 UTC
  1 point
  −5
  Parent
  The point is not what Reddit commenters think, the point is what OpenAI thinks. I read OP (and the original source) as saying that if ARC had indicated that release was unsafe, then OpenAI would not have released the model until it could be made safe.
  - habryka 15 Mar 2023 16:58 UTC
    8 points
    4
    Parent
    
    Also, something that I think is worth checking out is this reddit thread on r/ChatGPT discussing the ARC eval. It seems that people are really taking the ARC eval seriously. In this situation, ARC did not recommend against deployment, but it seems like if they had, lots of people in fact would have found it quite concerning
    
    This reads to me as clearly referring to the reddit comments as evidence that “if ARC had recommended against deployments lots of people [redditors] would have been quite concerned”.