Thane Ruthenis comments on Value Formation: An Overarching Model

Thane Ruthenis 15 Dec 2022 22:38 UTC
LW: 1 AF: 1
−2
AF
What does it mean to ask “how hard should I optimize”?
Satisficing threshold, probability of the plan’s success, the plan’s robustness to unexpected perturbations, etc. I suppose the argmin is somewhat misleading: the GPS doesn’t output the best possible plan for achieving some goal in the world outside the agent, it’s solving the problem in the most efficient way possible, which often means not spending too much time and resources on it. I. e., “mental resources spent” is part of the problem specification, and it’s something it tries to minimize too.
I don’t think this argmin is the central reason for grader-optimization problems here.
Really? I think that people usually don’t do that in life-or-death scenarios. People panic all the time.
I’m assuming no time pressure. Or substitute-in “a matter of grave importance that you nonetheless feel capable of resolving”.
- TurnTrout 15 Dec 2022 23:04 UTC
  LW: 2 AF: 2
  0
  AF Parent
  I don’t think this argmin is the central reason for grader-optimization problems here.
  I’m going to read the rest of the essay, and also I realize you posted this before my four posts on “holy cow argmax can blow all your alignment reasoning out of reality all the way to candyland.” But I want to note that including an argmin in the posited motivational architecture makes me extremely nervous / distrusting. Even if this modeling assumption doesn’t end up being central to your arguments on how shard-agents become wrapper-like, I think this assumption should still be flagged extremely heavily.
  - Thane Ruthenis 15 Dec 2022 23:11 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Mm, I believe that it’s not central because my initial conception of the GPS didn’t include it at all, and everything still worked. I don’t think it serves the same role here as you’re critiquing in the posts you’ve linked; I think it’s inserted at a different abstraction level.
    But sure, I’ll wait for you to finish with the post.