TurnTrout comments on Seriously, what goes wrong with “reward the agent when it makes you smile”?

TurnTrout 15 Aug 2022 4:05 UTC
LW: 6 AF: 3
0
AF
Perhaps the optimal configuration for utility per unit of matter, under this utility function, happens to be a tiny molecular structure shaped roughly like a paperclip.
I think this is very improbable, but thanks for the quote. Not sure if it addresses my question?
- Daniel Kokotajlo 25 Aug 2022 3:10 UTC
  LW: 5 AF: 4
  2
  AF Parent
  Yudkowsky & I would of course agree that that is very improbable. It’s just an example.
  
  The point I was making with this quote is that the question you are asking is a Big Old Unsolved Problem in the literature. If we had any idea what sort of utility function the system would end up with, that would be great and an improvement over the status quo. Yudkowsky’s point in the quote is that it’s a complicated multi-step process we currently don’t have a clue about, it’s not nearly as simple as “the system will maximize reward.” A much better story would be “The system will maximize some proxy, which will gradually evolve via SGD to be closer and closer to reward, but at some point it’ll get smart enough to go for reward for instrumental convergence reasons and at that point its proxy goal will crystallize.” But this story is also way too simplistic. And it doesn’t tell us much at all about what the proxy will actually look like, because so much depends on the exact order in which various things are learned.
  
  I should have made it just a comment, not an answer.
  - TurnTrout 29 Aug 2022 21:30 UTC
    LW: 2 AF: 2
    0
    AF Parent
    because so much depends on the exact order in which various things are learned.
    I actually doubt that claim in its stronger forms. I think there’s some substantial effect, but e.g. whether a child loves their family doesn’t depend strongly on the precise curriculum at grade school.
    - Daniel Kokotajlo 30 Aug 2022 2:36 UTC
      LW: 4 AF: 4
      0
      AF Parent
      Yet whether a child grows up to work on x-risk reduction vs. homeless shelters vs. voting Democrats out of office vs. voting Republicans out of office does often depend on the precise curriculum in college+high school.
      
      (I think we are in agreement here. I’d be interested to hear if you can point to any particular value AGI will probably have, or (weaker) any particular value such that if AGI has it, it doesn’t depend strongly on the curriculum, order in which concepts are learned, etc.)