Charlie Steiner comments on Broad Picture of Human Values

Charlie Steiner 21 Aug 2022 0:11 UTC
3 points
1
But the planner’s actual terminal value of satisfying the shard economy’s weighted preferences...
Suppose I have a heuristic that fires strongly when I’m eating cupcakes or thinking thoughts that will lead to eating cupcakes, and then a control algorithm that makes my muscles fire to make things happen that correspond to thoughts that a heuristic rates highly, and this control algorithm is hooked up to the cupcake heuristic.
Saying that the “actual terminal value” of the control algorithm is to satisfy whatever heuristic is in its input slot (and so changing the heuristic to one that fires for strawberries wouldn’t be a big deal) is kinda wrong. Saying that the “actual terminal value” of the control algorithm is only cupcakes and nothing else would make sense is also kinda wrong. They’re both kinda wrong because trying to declare one thing the “actual terminal value” is the wrong exercise to be engaging in in the first place!
This is related to my other warning about the word “actual”: this idea that you’re “actually” the control algorithm and not the cupcake heuristic. There are multiple ways to think about you that work better or worse in different contexts (Since I just finished editing a sequence about this, I will shamelessly link it). I am so large I don’t just contain multitudes, I contain multitudes of ways of parceling myself up into multitudes.
- Thane Ruthenis 21 Aug 2022 0:48 UTC
  1 point
  −3
  Parent
  They’re both kinda wrong because trying to declare one thing the “actual terminal value” is the wrong exercise to be engaging in in the first place!
  I disagree. I’m not talking about the intentional stance or such “external” descriptions. I’m claiming that if you took the explicit algorithmic implementation of the human mind and looked over it, you would find some kind of distinct “planner” part, and that part would be something like an idealized utility-maximizer with a pointer to the shard economy in place of its utility function.
  It’s not a frame that can be kinda wrong/awkward to use. It’s a specific mechanistic prediction that’s either flat-out right or flat-out wrong.
  This is related to my other warning about the word “actual”: this idea that you’re “actually” the control algorithm and not the cupcake heuristic
  Mm, I’m more willing to relax this assumption. It ties into my model of self-awareness — I suspect it might be the case that the planner is the thing that’s being fed summaries of the brain’s state, making it literally the thing that’s having qualia. But I haven’t fully worked out my model of that.