Wei Dai comments on A (EtA: quick) note on terminology: AI Alignment != AI x-safety

Wei Dai Feb 9, 2023, 8:21 AM
LW: 4 AF: 3
0
AF
My personal view is that given all of this history and the fact that this forum is named the “AI Alignment Forum”, we should not redefine “AI Alignment” to mean the same thing as “Intent Alignment”. I feel like to the extent there is confusion/conflation over the terminology, it was mainly due to Paul’s (probably unintentional) overloading of “AI alignment” with the new and narrower meaning (in Clarifying “AI Alignment”), and we should fix that error by collectively going back to the original definition, or in some circumstances where the risk of confusion is too great, avoiding “AI alignment” and using some other term like “AI x-safety”. (Although there’s an issue with “existential risk/safety” as well, because “existential risk/safety” covers problems that aren’t literally existential, e.g., where humanity survives but its future potential is greatly curtailed. Man coordination is hard.)
- paulfchristiano Feb 9, 2023, 4:23 PM
  LW: 8 AF: 6
  8
  AF Parent
  I feel like to the extent there is confusion/conflation over the terminology, it was mainly due to Paul’s (probably unintentional) overloading of “AI alignment” with the new and narrower meaning (in Clarifying “AI Alignment”)
  I don’t think this is the main or only source of confusion:
  - MIRI folks also frequently used the narrower usage. I think the first time I saw “aligned” was in Aligning Superintelligence with Human Interests from 2014 (scraped by wayback on January 3 2015) which says “We call a smarter-than-human system that reliably pursues beneficial goals “aligned with human interests” or simply “aligned.””
  - Virtually every problem people discussed as part of AI alignment was also part of intent alignment. The name was deliberately chosen to evoke “pointing” your AI in a direction. Even in the linked post Eliezer uses “pointing the AI in the right direction” as a synonym for alignment.
  - It was proposed to me as a replacement for the narrower term AI control, which quite obviously doesn’t include all the broader stuff. In the email thread where Rob suggested I adopt it he suggested it was referring to what Nick Bostrom called the “second principal-agent problem” between AI developers and the AI they build.
  the overarching research topic of how to develop sufficiently advanced machine intelligences such that running them produces good outcomes in the real world
  I want to emphasize again that this definition seems extremely bad. A lot of people think their work helps AI actually produce good outcomes in the world when run, so pretty much everyone would think their work counts as alignment.
  It includes all work in AI ethics, if in fact that research is helpful for ensuring that future AI has a good outcome. It also includes everything people work on in AI capabilities, if in fact capability increases improve the probability that a future AI system produces good outcomes when run. It’s not even restricted to safety, since it includes realizing more upside from your AI. It includes changing the way you build AI to help address distributional issues, if the speaker (very reasonably!) thinks those are important to the value of the future. I didn’t take this seriously as a definition and didn’t really realize anyone was taking it seriously, I thought it was just an instance of speaking loosely.
  But if people are going to use the term this way, I think at a minimum they cannot complain about linguistic drift when “alignment” means anything at all. Obviously people are going to disagree about what AI features lead to “producing good outcomes.” Almost all the time I see definitional arguments it’s where people (including Eliezer) are objecting that “alignment” includes too much stuff and should be narrower, but this is obviously not going to be improved by adopting an absurdly broad definition.