I’m skeptical that hard take off is something to worry about anytime soon, but setting that aside, I think it’s extremely valuable to think through these questions of organizational culture, because there are a lot of harms that can come from mere AI (as opposed to AGI) and all of these reflections pertain to less exotic but still pressing concerns about trustworthy AI.
These reflections very nicely cover what is hard about self-regulation, particularly in for-profit organizations. I think what is missing though is the constitutive role of the external, regulatory environment on the risk management structures and practices an organization adopts. Legislation and regulations create regulatory risks—financial penalties and public embarrassment for breaking the rules—that force companies (from the outside) to create the cultures and organs of responsibility this post describes. It is external force—and probably only external force—creates these internal shapes.
To put this point in the form of a prediction: show me a company with highly developed risk management practices and a culture of responsibility, and I will show you the government regulations that organization is answerable to. (This won’t be true all of the time, but will be true for most HROs, for banking (in non-US G20 countries, at least), for biomedical research, and other areas.)
In fairness, I have to acknowledge that specific regulations for AI are not here yet, but they are coming soon. Pure self-regulation of AI companies is probably a futile goal. By contrast, operating under a sane, stable, coherent regulatory environment would actually bring a lot advantages to every company working on AI.
Thanks for sharing the link to ARC. It seems to me the kinds of things they are testing for and worried about are analogous to the risks of self-driving cars: when you incorporate ML systems into a range of human activities, their behaviour is unpredictable and can be dangerous. I am glad ARC is doing the work they are doing. People are using unpredictable tools and ARC is investigating the risks. That’s great.
I don’t think these capabilities ARC is looking at are “similar” to runaway intelligence, as you suggest. They clearly do not require it. They are far more mundane (but dangerous nonetheless, as you rightly point out).
At one point in the ARC post, they hint vaguely at being motivated by Yudkowsky-like worries: “As AI systems improve, it is becoming increasingly difficult to rule out that models might be able to autonomously gain resources and evade human oversight – so rigorous evaluation is essential.” They seem to be imagining a system giving itself goals, such that it is motivated to engage in tactical deception to carry out its goals—a behaviour we find in a range of problem-solving non-human animals. It strikes me as a worry that is extraneous to the good work ARC is doing. And the end of the quote is odd, since rigorous evaluation is clearly essential regardless of autonomous resource gains or oversight evasion.