Ericf comments on Naive comments on AGIlignment

Ericf 29 Apr 2022 3:01 UTC
1 point
1. Conversely, if FDR wants a chicken in every pot, and then finds out that chickens don’t exist, he would change his values to want a beef roast in every pot, or some such.
2. How could it possibly deduce that without reference to some real world effect? There is no reason a-priori to prefer one sort to another. That involves valuing coming to the conclusion using fewer calculations (of what kind?), less time (or maybe more time, or more consistent amount of time is better?), or less risk of error. And the same applies for any other change: knowing which version is better requires both a measurement system, and an evaluation of each thing. And for any novel problem, the answer -by definition- won’t be available for lookup.
3. The goals of an AGI are not uniformly drawn from all possible goals.
- TLW 29 Apr 2022 13:14 UTC
  2 points
  Parent
  Conversely, if FDR wants a chicken in every pot, and then finds out that chickens don’t exist, he would change his values to want a beef roast in every pot, or some such.
  I do not believe his value function is “a chicken in every pot”. It’s likely closer to ‘I don’t want anyone to be unable to feed themselves’, although even this is likely an over-approximation of the true utility function. ‘A chicken in every pot’ is one way of doing well on said utility function. If he found out that chickens didn’t exist, the ‘next best thing’ might be a roast beef in every pot, or somesuch. This is not changing the value function itself, merely the optimum^[1] solution.
  If FDR’s true value function was literally ” a chicken in every pot”, with no tiebreaker, then he has no incentive to change his values, and a weak incentive to not change his values (after all, it’s possible that everyone was mistaken, or that he could invent chicken).
  If FDR’s true value function was e.g. “a chicken in every pot, or barring that some other similar food”, then again he has no incentive to change his values. He may lean toward ‘ok, it’s very unlikely that chickens exist so it’s better in expected value to work towards roast beef in every pot’, but that again hasn’t changed the underlying utility function.
  1. ^
    This isn’t likely to be the optimum, but at least is a ‘good’ point.