I would question the framing of mental subagents as “mesa optimizers” here. This sneaks in an important assumption: namely that they are optimizing anything. I think the general view of “humans are made of a bunch of different subsystems which use common symbols to talk to one another” has some merit, but I think this post ascribes a lot more agency to these subsystems than I would. I view most of the subagents of human minds as mechanistically relatively simple.
For example, I might reframe a lot of the elements of talking about the unattainable “object of desire” in the following way:
1. Human minds have a reward system which rewards thinking about “good” things we don’t have (or else we couldn’t ever do things) 2. Human thoughts ping from one concept to adjacent concepts 3. Thoughts of good things associate to assessment of our current state 4. Thoughts of our current state being lacking cause a negative emotional response 5. The reward signal fails to backpropagate to the reward system in 1 enough, so the thoughts of “good” things we don’t have are reinforced 6. The cycle continues
I don’t think this is literally the reason, but framings on this level seem more mechanistic to me.
I also think that any framings along the lines of “you are lying to yourself all the way down and cannot help it” and “literally everyone is messed in some fundamental way and there are no humans who can function in satisfying way” are just kind of bad. Seems like a Kafka trap to me.
I’ve spoken elsewhere about the human perception of ourselves as a coherent entity being a misfiring of systems which model others as coherent entities (for evolutionary reasons), I don’t particularly think some sort of societal pressure is the primary reason for our thinking of ourselves as being coherent, although societal pressure is certainly to blame for the instinct to repress certain desires.
I also think that any framings along the lines of “you are lying to yourself all the way down and cannot help it” and “literally everyone is messed in some fundamental way and there are no humans who can function in satisfying way” are just kind of bad. Seems like a Kafka trap to me.
And imagine that the person pushing this drug had instead said, “I am lying to myself all the way down and cannot help it”, and “literally everyone including me etc.” Well, no need to pay any more attention to them.
I would question the framing of mental subagents as “mesa optimizers” here. This sneaks in an important assumption: namely that they are optimizing anything. I think the general view of “humans are made of a bunch of different subsystems which use common symbols to talk to one another” has some merit, but I think this post ascribes a lot more agency to these subsystems than I would. I view most of the subagents of human minds as mechanistically relatively simple.
I actually like mesa-optimizer because it implies less agency than “subagent”. A mesa-optimizer in AI or evolution is a thing created to implement a value of its meta-optimizer, and the alignment problem is precisely the part where a mesa-optimizer isn’t necessarily smart enough to actually optimize anything, and especially not the thing that it was created for. It’s an adaptation-executor rather than a fitness-maximizer, whereas subagent implies (at least to me) that it’s a thing that has some sort of “agency” or goals that it seeks.
I think if you are always around people who are messed up, then you’ll conclude that humans are fundamentally messed up, and eventually you stop noticing all the normal people who have functional lives and are mostly content. And since (in my opinion) the entirety of WEIRD culture is about as messed up as a culture can be, you’ll never meet any functional humans unless you take a vacation to spend time with people you might perceive as uneducated, stupid, poor, or foreign.
I would question the framing of mental subagents as “mesa optimizers” here. This sneaks in an important assumption: namely that they are optimizing anything. I think the general view of “humans are made of a bunch of different subsystems which use common symbols to talk to one another” has some merit, but I think this post ascribes a lot more agency to these subsystems than I would. I view most of the subagents of human minds as mechanistically relatively simple.
For example, I might reframe a lot of the elements of talking about the unattainable “object of desire” in the following way:
1. Human minds have a reward system which rewards thinking about “good” things we don’t have (or else we couldn’t ever do things)
2. Human thoughts ping from one concept to adjacent concepts
3. Thoughts of good things associate to assessment of our current state
4. Thoughts of our current state being lacking cause a negative emotional response
5. The reward signal fails to backpropagate to the reward system in 1 enough, so the thoughts of “good” things we don’t have are reinforced
6. The cycle continues
I don’t think this is literally the reason, but framings on this level seem more mechanistic to me.
I also think that any framings along the lines of “you are lying to yourself all the way down and cannot help it” and “literally everyone is messed in some fundamental way and there are no humans who can function in satisfying way” are just kind of bad. Seems like a Kafka trap to me.
I’ve spoken elsewhere about the human perception of ourselves as a coherent entity being a misfiring of systems which model others as coherent entities (for evolutionary reasons), I don’t particularly think some sort of societal pressure is the primary reason for our thinking of ourselves as being coherent, although societal pressure is certainly to blame for the instinct to repress certain desires.
It fails the Insanity Wolf Sanity Check.
And imagine that the person pushing this drug had instead said, “I am lying to myself all the way down and cannot help it”, and “literally everyone including me etc.” Well, no need to pay any more attention to them.
I actually like mesa-optimizer because it implies less agency than “subagent”. A mesa-optimizer in AI or evolution is a thing created to implement a value of its meta-optimizer, and the alignment problem is precisely the part where a mesa-optimizer isn’t necessarily smart enough to actually optimize anything, and especially not the thing that it was created for. It’s an adaptation-executor rather than a fitness-maximizer, whereas subagent implies (at least to me) that it’s a thing that has some sort of “agency” or goals that it seeks.
I think if you are always around people who are messed up, then you’ll conclude that humans are fundamentally messed up, and eventually you stop noticing all the normal people who have functional lives and are mostly content. And since (in my opinion) the entirety of WEIRD culture is about as messed up as a culture can be, you’ll never meet any functional humans unless you take a vacation to spend time with people you might perceive as uneducated, stupid, poor, or foreign.
Pretty well traveled WEIRD culture member here requesting explanation