I want something a bit more detailed / grounded in example. Like in the example you quote of Eliezer buying a lottery ticket and resolving only to care about winners, what goes through that person’s mind as they wake up? What algorithms do they use? I’ll give it a shot.
So, they’re a person. They look at the lottery ticket in their hand. It’s not a winner. They remember “them” (flashing warning sign) resolving only to care about winners. They think “wtf.”
Okay, so, back to the flashing warning sign. How do they know they were the ones who resolved that? So, there’s a fork in the road. One way for someone to identify themselves in memory is to allow them to label a person in a model of the world “themself,” and they can do this really easily since “themself” is the person in the model who all your memories are from the perspective of. Another way is model-less and label-less, the first-person of predicting UNDEFINED is just automatically assumed to conflict with a state of not-UNDEFINED, resulting in wtf. The “self” just comes from the structure of how memories are accessed, compared, and what’s output. Biological systems definitely seem more likely to be the second.
Now, for present Eliezer contemplating the future, then I think you need abstraction. But if Eliezer-contemplating-the-past is running on biological emergent-pilot, Eliezer-contemplating-the-future won’t be able to change how he responds to the past by changing abstractions.
Not that a utility maximizer would gain anything by changing who they self-identified with. After all, you take actions to maximize your current utility function, not the utility function you’ll have in the future—and this includes adopting new utility functions. In a simple game where you can pay money to only self-identify with winners in a gamble, utility maximizers won’t pay the money.
I want something a bit more detailed / grounded in example. Like in the example you quote of Eliezer buying a lottery ticket and resolving only to care about winners, what goes through that person’s mind as they wake up? What algorithms do they use? I’ll give it a shot.
So, they’re a person. They look at the lottery ticket in their hand. It’s not a winner. They remember “them” (flashing warning sign) resolving only to care about winners. They think “wtf.”
Okay, so, back to the flashing warning sign. How do they know they were the ones who resolved that? So, there’s a fork in the road. One way for someone to identify themselves in memory is to allow them to label a person in a model of the world “themself,” and they can do this really easily since “themself” is the person in the model who all your memories are from the perspective of. Another way is model-less and label-less, the first-person of predicting UNDEFINED is just automatically assumed to conflict with a state of not-UNDEFINED, resulting in wtf. The “self” just comes from the structure of how memories are accessed, compared, and what’s output. Biological systems definitely seem more likely to be the second.
Now, for present Eliezer contemplating the future, then I think you need abstraction. But if Eliezer-contemplating-the-past is running on biological emergent-pilot, Eliezer-contemplating-the-future won’t be able to change how he responds to the past by changing abstractions.
Not that a utility maximizer would gain anything by changing who they self-identified with. After all, you take actions to maximize your current utility function, not the utility function you’ll have in the future—and this includes adopting new utility functions. In a simple game where you can pay money to only self-identify with winners in a gamble, utility maximizers won’t pay the money.