What I find incredible is how contributing to the development of existentially dangerous systems is viewed as a morally acceptable course of action within communities that on paper accept that AGI is a threat.
Both OpenAI and Anthropic are incredibly influential among AI safety researchers, despite both organisations being key players in bringing the advent of TAI ever closer.
Both organisations benefit from lexical confusion over the word “safety”.
The average person concerned with existential risk from AGI might assume “safety” means working to reduce the likelihood that we all die. They would be disheartened to learn that many “AI Safety” researchers are instead focused on making sure contemporary LLMs behave appropriately. Such “safety” research simply makes the contemporary technology more viable and profitable, driving investment and reducing timelines. There is to my knowledge no published research that proves these techniques will extend to controlling AGI in a useful way.*
OpenAI’s “Superalignment” plan is a more ambitious safety play.Their plan to “solve” alignment involves building a human level general intelligence within 4 years and then using this to automate alignment research.
But there are two obvious problems:
-
a human level general intelligence is already most of the way toward a superhuman general intelligence (simply give it more compute). Cynically, Superintelligence is a promise that OpenAI’s brightest safety researchers will be trying their hardest to bring about an AGI within 4 years.
-
The success of Superalignment means we are now in the position of trusting that a for-profit, private entity will only use the human level AI researchers to research safety, instead of making the incredibly obvious play of having virtual researchers research how to build the next generation better, smarter automated researchers.
To conclude, if it looks like a duck, swims like a duck and quacks like a duck, it’s a capabilities researcher.
*This point could (and probably should) be a post in itself. Why wouldn’t techniques that work on contemporary AI systems extend to AGI?
Pretend for a moment that you and I are silicon-based aliens who have recently discovered that carbon based lifeforms exist, and can be used to run calculations. Scientists have postulated that by creating complex enough carbon structures we could invent “thinking animals”. We anticipate that these strange creatures will be built in the near future and that they might be difficult to control.
As we can’t build thinking animals today, we are stuck studying single cell carbon organisms. A technique has just been discovered in which we can use a compound called “sugar” to influence the direction in which these simple organisms move.
Is it reasonable to then conclude that you will be able to predict and control the behaviour of much more complex, multicelled creature called a “human” by spreading sugar out on the ground?
On the OpenPhil / OpenAI Partnership
Epistemic Note:
The implications of this argument being true are quite substantial, and I do not have any knowledge of the internal workings of Open Phil.
(Both title and this note have been edited, cheers to Ben Pace for very constructive feedback.)
Premise 1:
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.
Premise 2:
This was the default outcome.
Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.
Edit: To clarify, you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).
Premise 3:
Without repercussions for terrible decisions, decision makers have no skin in the game.
Conclusion:
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn’t be allowed anywhere near AI Safety decision making in the future.
To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.
This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.
To quote OpenPhil:
”OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.”