Vladimir_Nesov comments on How I think about alignment

Vladimir_Nesov 13 Aug 2022 19:18 UTC
LW: 3 AF: 2
2
AF
That’s not what I’m talking about. I’m not talking about what known goals are saying, or what they are speaking of, what they consider valuable or important. I’m talking about where the data to learn what they are is located, as we start out not knowing the goals at all and need to learn them. There is a particular thing, say a utility function, that is the intended formulation of goals. It could be the case that this intended utility function could be found somewhere in the brain. That doesn’t mean that it’s a utility function that cares about brains, the questions of where it’s found and what it cares about are unrelated.

Or it could be the case that it’s recorded on an external hard drive, and the brain only contains the name of the drive (this name is a “pointer to value”). It’s simply not the case that you can recover this utility function without actually looking at the drive, and only looking at the brain. So utility function u itself depends on environment E, that is there is some method of formulating utility functions t such that u=t(E). This is not the same as saying that utility of environment depends on environment, giving the utility value u(E)=t(E)(E) (there’s no typo here). But if it’s actually in the brain, and says that hard drives are extremely valuable, then you do get to know what it is without looking at the hard drives, and learn that it values hard drives.