Richard_Ngo comments on What’s Up With Confusingly Pervasive Goal Directedness?

Richard_Ngo Jan 21, 2022, 5:46 PM
7 points
I agree that highly agentic versions of the system will complete the tasks better. My claim is just that they’re not necessary to complete the task very well, and so we shouldn’t be confident that selection for completing that task very well will end up producing the highly agentic versions.
- Martin Randall Jan 22, 2022, 3:01 PM
  5 points
  Parent
  That helps, thanks. Raemon says:
  
  The part where alignment is hard is precisely when the thing I’m trying to accomplish is hard. Because then I need a powerful plan, and it’s hard to specify a search for powerful plans that don’t kill everyone.
  
  I now read you as pointing to chess as:
  - It is “hard to accomplish” from the perspective of human cognition.
  - It does not require a “powerful”/”agentic” plan.
  - It’s “easy” to specify a search for a good plan, we already did it.
  So maybe alignment is like that.
  - Richard_Ngo Jan 22, 2022, 4:21 PM
    4 points
    Parent
    Yepp. And clearly alignment is much harder than chess, but it seems like an open question whether it’s harder than “kill everyone” (and even if it is, there’s an open question of how much of an advantage we get from doing our best to point the system at the former not the latter).
    - Zack_M_Davis Jan 22, 2022, 5:28 PM
      6 points
      Parent
      
      whether it’s harder than “kill everyone”
      
      “Kill everyone” seems like it should be “easy”, because there are so many ways to do it: humans only survive in environments with a specific range of temperatures, pressures, atmospheric contents, availability of human-digestible food, &c.