rank-biserial answers Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe

rank-biserial 11 Apr 2022 8:16 UTC
7 points
I find point no. 4 weak.
1. Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
I worry that when people reason about utility functions, they’re relying upon the availability heuristic. When people try to picture “a random utility function”, they’re heavily biased in favor of the kind of utility functions they’re familiar with, like paperclip-maximization, prediction error minimization, or corporate profit-optimization.

How do we know that a random sample from utility-function-space looks anything like the utility functions we’re familiar with? We don’t. I wrote a very short story to this effect. If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
What links here?
- rank-biserial's comment on Epistemic Slipperiness by Raemon (11 Apr 2022 8:34 UTC; 7 points)
- Rob Bensinger 11 Apr 2022 19:06 UTC
  5 points
  Parent
  If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
  Coherence arguments imply a force for goal-directed behavior.
  - rank-biserial 11 Apr 2022 19:27 UTC
    1 point
    Parent
    I endorse Rohin Shah’s response to that post.
    
    You might think “well, obviously the superintelligent AI system is going to care about things, maybe it’s technically an assumption but surely that’s a fine assumption”. I think on balance I agree, but it doesn’t seem nearly so obvious to me, and seems to depend on how exactly the agent is built. For example, it’s plausible to me that superintelligent expert systems would not be accurately described as “caring about things”, and I don’t think it was a priori obvious that expert systems wouldn’t lead to AGI. Similarly, it seems at best questionable whether GPT-3 can be accurately described as “caring about things”.
    - Rob Bensinger 11 Apr 2022 22:04 UTC
      3 points
      Parent
      This seems like a very different position from the one you just gave:
      I worry that when people reason about utility functions, they’re relying upon the availability heuristic. When people try to picture “a random utility function”, they’re heavily biased in favor of the kind of utility functions they’re familiar with, like paperclip-maximization, prediction error minimization, or corporate profit-optimization.
      How do we know that a random sample from utility-function-space looks anything like the utility functions we’re familiar with? We don’t. I wrote a very short story to this effect. If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
      I took you to be saying, ‘You can retroactively fit a utility function to any sequence of actions, so we gain no predictive power by thinking in terms of utility functions or coherence theorems at all. People worry about paperclippers not because there are coherence pressures pushing optimizers toward paperclipper-style behavior, but because paperclippers are a vivid story that sticks in your head.’