Donald Hobson comments on Aligned AI Needs Slack

Donald Hobson 26 Jan 2022 10:56 UTC
3 points
I disagree somewhat. It is in principle possible to have an AI with a utility function, and that single optimum it reaches for is actually really nice. Most random utility functions are bad, but there are a few good ones.
Suppose a maximally indifferent AI. U(world)=3. Ie constant. Whatever happens, the AI gets utility 3. It doesn’t care in the slightest about anything. How the AI behaves depends entirely on the tiebreaker mechanism.
Just because the AI is guiding towards a huge range of worlds, some of them good, doesn’t mean we get a good outcome. It has “no incentive” to trick its creators, but no incentive not to. You have specified an AI that imagines a trillion trillion paths through time. Some good. Many not. Then it uses some undefined piece of code to pick one. Until you specify how this works, we can’t tell if the outcome will be good.
- Matt Goldenberg 26 Jan 2022 15:36 UTC
  0 points
  Parent
  Most random utility functions are bad, but there are a few good ones. *
  *Citation Needed
  - Donald Hobson 26 Jan 2022 19:58 UTC
    2 points
    Parent
    My primary moral is to resist the temptation to generalize over all of mind design space
    If we focus on the bounded subspace of mind design space which contains all those minds whose makeup can be specified in a trillion bits or less, then every universal generalization that you make has two to the trillionth power chances to be falsified.
    Conversely, every existential generalization—“there exists at least one mind such that X”—has two to the trillionth power chances to be true.
    So you want to resist the temptation to say either that all minds do something, or that no minds do something.
    https://www.lesswrong.com/posts/tnWRXkcDi5Tw9rzXw/the-design-space-of-minds-in-general
    There are some states of the world you would consider good, so the utility functions that aim for those states are good too. There are utility functions that think X is bad and Y is good to the exact same extent you think these things are bad or good.