Another nice example of “sound[ing] like a human being” is Stuart Russell’s explanation of “the gorilla problem” in the book Human Compatible. Quoting directly from the start of chapter 5:
It doesn’t require much imagination to see that making something smarter than yourself could be a bad idea. We understand that our control over the environment and over other species is a result of our intelligence, so the thought of something else being more intelligent than us—whether it’s a robot or an alien—immediately induces a queasy feeling.
Around ten million years ago, the ancestors of the modern gorilla created (accidentally, to be sure) the genetic lineage leading to modern humans. How do the gorillas feel about this? Clearly, if they were able to tell us about their species’ current situation vis-à-vis humans, the consensus opinion would be very negative indeed. Their species has essentially no future beyond that which we deign to allow. We do not want to be in a similar situation vis-à-vis superintelligent machines. I’ll call this the gorilla problem—specifically, the problem of whether humans can maintain their supremacy and autonomy in a world that includes machines with substantially greater intelligence.
Great post. I’m on GDM’s new AI safety and alignment team in the Bay Area and hope readers will consider joining us!
What evidence is there that working at a scaling lab risks creating a “corrupted” perception? When I try thinking of examples, the people that come to my mind seem to have quite successfully transitioned from working at a scaling lab to doing nonprofit / government work. For example:
Paul Christiano went from OpenAI to the nonprofit Alignment Research Center (ARC) to head of AI safety at the US AI Safety Institute.
Geoffrey Irving worked at Google Brain, OpenAI, and Google DeepMind. Geoffrey is now Chief Scientist at the UK AI Safety Institute.
Beth Barnes worked at DeepMind and OpenAI and is now founder and head of research at Model Evaluation and Threat Research (METR).