When you say “values”, do you mean instrumental values, or do you mean terminal values? If the former then the answer is simple. This is what we spend most of our time doing. Will tweaking my diet in this way cause me to have more energy? Will asking my friend in this particular way cause them to accept my request? Etc. This is as mundane as it gets.
If the latter, the answer is a bit more complicated, but really it shouldn’t be all that confusing. As agents, we’re built with motivation systems, where out of all possible sensory patterns, some present to us as neutral, others as inherently desirable, and the last subset as inherently undesirable. Some things can be more desirable or less desirable, etc., thus these sensory components each run on at least one dimension.
Sensory patterns that present originally as inherently neutral may either be left as irrelevant (these are the things put on auto-ignore, which are apt to make a return to one’s conscious awareness if certain substances are taken, or if careful introspection is engaged in), or otherwise acquire a ‘secondary’ desirability or undesirability via being seen to be in causal connection with something that presents as inherently one way or the other, for example finding running enjoyable because of certain positive benefits acquired in the past from the activity.
Thus to discover one’s terminal values, one must simply identify these inherently desirable sensory patterns, and figure out which ones would top the list as ‘most desirable’ (in terms of nothing other than how it strikes one’s perception). A good heuristic for this would be to see what other people consider enjoyable or fun, and then try it, and see what happens, but at the same time making sure to disambiguate any identity issues from the whole thing, such as sexual hangups making one unable to enjoy something widely considered to have one of the strongest effects in terms of ‘wanting to engage in this behavior because it’s so great’—sexual or romantic interaction.
But at the most fundamental, there’s nothing to the task of figuring out one’s terminal values other than simply figuring out what sensory patterns are most ‘enjoyable’ in the most basic sort of way imaginable, on a timescale sufficiently long-term to be something one would be unlikely to refer to as ‘akrasia’. Even someone literally physically unable to experience certain positive sensory patterns, such as someone with extremely low libido because of physiological problems, would most likely qualify as making a ‘good choice’ if they engage in a course of action apt to cause them to begin to be able to experience these sensory patterns, such as that person implementing a particular lifestyle protocol likely to fix their physiological issues and bring them libido to a healthy level.
It gets somewhat confusing when you factor in the fact that the sensory patterns one is able to experience can shift over time, such as libido increasing or decreasing, or going through puberty, or something like that, along with factoring in akrasia, and other problems that make us seem less ‘coherent’ of agents, but I believe all the fog can be cut through if one simply makes the observation that sensory patterns present to us as either neutral, inherently desirable, or inherently undesirable, and that the latter two run on a dimension of ‘more or less’. Neutral sensory patterns acquire ‘secondary’ quality on these dimensions depending on what the agent believes to be their causal connections to other sensory patterns, each ultimately needing to run up against an ‘inherently motivating’ sensory pattern to acquire significance.
really it shouldn’t be all that confusing. As agents, we’re built with motivation systems, where out of all possible sensory patterns, some present to us as neutral, others as inherently desirable, and the last subset as inherently undesirable.
While I sympathize with you, I think you should decrease your threshold for apparent difficulty of problems.
For example, you should be able to choose between things that will make no sensory difference to you, such as the well-being of people in Xela. And of course you dodge the question of what is “enjoyable”—is a fistfight enjoyable if it makes you grin and your heart race but afterwards you never want to do it again? What algorithm should an AI follow to decide? You have to try and reduce “enjoyable” to things like “things you’d do again” or “things that make your brain release chemical cocktail X.” And then you have to realize that those definitions are best met by meth, or an IV of chemical cocktail X, not by cool stuff like riding dinosaurs or having great sex.
For example, you should be able to choose between things that will make no sensory difference to you, such as the well-being of people in Xela.
This is an example of the sort of loose terminology that leads most people into the fog on these sorts of problems. If it makes no sensory difference, then it makes no sensory difference, and there’s nothing to care about, as there’s nothing to decide between. You can’t choose between two identical things.
Or to be more charitable, I should say that what seems to have happened here is that I was using the term “sensory pattern” to refer to any and all subjective experiences appearing on one’s visual field, etc., whereas you seem to be using the phrase “makes no sensory difference” to refer to the subset of subjective experience we call ‘the real world’.
True, if I’ve never been to Xela, the well-being of the people there (presumably) makes no difference to my experience of everyday things in the outside world, such as the people I know, or what’s going on in the places I do go. But this is not a problem. Mention the place, and explain the conditions in detail, employing colorful language and eloquent description, and before long there will be a video playing in my mind, apt to make me happy or sad, depending on the well-being of the people therein.
And of course you dodge the question of what is “enjoyable”—is a fistfight enjoyable if it makes you grin and your heart race but afterwards you never want to do it again?
I don’t see the contradiction. Unless I’m missing something in my interpretation of your example, all that must be said is that the experience was enjoyable because certain dangers didn’t play out, such as getting injured or being humiliated, but you’d rather not repeat that experience, for you may not be so lucky in the future. Plenty of things are enjoyable unless they go wrong, and are rather apt to go wrong, and thus are candidates for being something one enjoys but would rather not repeat.
For example, let’s say you get lost in the moment, and have unprotected sex. You didn’t have any condoms or anything, but everything else was perfect, so you went for it. You have the time of your life. After the fact you manage to put the dangers out of your mind, and just remember how excellent the experience was. Eventually it becomes clear that no STIs were transmitted, nor is there an unplanned pregnancy. The experience, because nothing went wrong, was excellent. But you decide it was a mistake.
There seems to be a contradiction here, saying that the experience was excellent, but that it was a mistake. But then you realize that the missing piece that makes it seem contradictory is the time factor. Once a certain amount of time passes, if nothing went wrong, one can say conclusively that nothing went wrong. 100% chance it was awesome and nothing went wrong. But at the time of the event, the odds were much worse. That’s all.
What algorithm should an AI follow to decide?
This seems off topic. Decide what? I thought we were talking about how to discover one’s terminal values as a human.
You have to try and reduce “enjoyable” to things like “things you’d do again” or “things that make your brain release chemical cocktail X.” And then you have to realize that those definitions are best met by meth, or an IV of chemical cocktail X, not by cool stuff like riding dinosaurs or having great sex.
Well if that’s the case then they’re unhelpful definitions. As far as I can see, nothing in my post would suggest a theory weak enough to output something like ‘do meth’, or ‘figure out how to wirehead’.
While I sympathize with you, I think you should decrease your threshold for apparent difficulty of problems.
Along with what I just posted, I should also mention that I did say these two lines:
at the most fundamental, there’s nothing to the task of figuring out one’s terminal values other than simply figuring out what sensory patterns are most ‘enjoyable’ in the most basic sort of way imaginable, on a timescale sufficiently long-term to be something one would be unlikely to refer to as ‘akrasia’
It gets somewhat confusing when you factor in [...] akrasia, and other problems that make us seem less ‘coherent’ of agents
Those seem to suggest I wasn’t being as naive as your reply seems to imply.
When you say “values”, do you mean instrumental values, or do you mean terminal values? If the former then the answer is simple. This is what we spend most of our time doing. Will tweaking my diet in this way cause me to have more energy? Will asking my friend in this particular way cause them to accept my request? Etc. This is as mundane as it gets.
If the latter, the answer is a bit more complicated, but really it shouldn’t be all that confusing. As agents, we’re built with motivation systems, where out of all possible sensory patterns, some present to us as neutral, others as inherently desirable, and the last subset as inherently undesirable. Some things can be more desirable or less desirable, etc., thus these sensory components each run on at least one dimension.
Sensory patterns that present originally as inherently neutral may either be left as irrelevant (these are the things put on auto-ignore, which are apt to make a return to one’s conscious awareness if certain substances are taken, or if careful introspection is engaged in), or otherwise acquire a ‘secondary’ desirability or undesirability via being seen to be in causal connection with something that presents as inherently one way or the other, for example finding running enjoyable because of certain positive benefits acquired in the past from the activity.
Thus to discover one’s terminal values, one must simply identify these inherently desirable sensory patterns, and figure out which ones would top the list as ‘most desirable’ (in terms of nothing other than how it strikes one’s perception). A good heuristic for this would be to see what other people consider enjoyable or fun, and then try it, and see what happens, but at the same time making sure to disambiguate any identity issues from the whole thing, such as sexual hangups making one unable to enjoy something widely considered to have one of the strongest effects in terms of ‘wanting to engage in this behavior because it’s so great’—sexual or romantic interaction.
But at the most fundamental, there’s nothing to the task of figuring out one’s terminal values other than simply figuring out what sensory patterns are most ‘enjoyable’ in the most basic sort of way imaginable, on a timescale sufficiently long-term to be something one would be unlikely to refer to as ‘akrasia’. Even someone literally physically unable to experience certain positive sensory patterns, such as someone with extremely low libido because of physiological problems, would most likely qualify as making a ‘good choice’ if they engage in a course of action apt to cause them to begin to be able to experience these sensory patterns, such as that person implementing a particular lifestyle protocol likely to fix their physiological issues and bring them libido to a healthy level.
It gets somewhat confusing when you factor in the fact that the sensory patterns one is able to experience can shift over time, such as libido increasing or decreasing, or going through puberty, or something like that, along with factoring in akrasia, and other problems that make us seem less ‘coherent’ of agents, but I believe all the fog can be cut through if one simply makes the observation that sensory patterns present to us as either neutral, inherently desirable, or inherently undesirable, and that the latter two run on a dimension of ‘more or less’. Neutral sensory patterns acquire ‘secondary’ quality on these dimensions depending on what the agent believes to be their causal connections to other sensory patterns, each ultimately needing to run up against an ‘inherently motivating’ sensory pattern to acquire significance.
While I sympathize with you, I think you should decrease your threshold for apparent difficulty of problems.
For example, you should be able to choose between things that will make no sensory difference to you, such as the well-being of people in Xela. And of course you dodge the question of what is “enjoyable”—is a fistfight enjoyable if it makes you grin and your heart race but afterwards you never want to do it again? What algorithm should an AI follow to decide? You have to try and reduce “enjoyable” to things like “things you’d do again” or “things that make your brain release chemical cocktail X.” And then you have to realize that those definitions are best met by meth, or an IV of chemical cocktail X, not by cool stuff like riding dinosaurs or having great sex.
This is an example of the sort of loose terminology that leads most people into the fog on these sorts of problems. If it makes no sensory difference, then it makes no sensory difference, and there’s nothing to care about, as there’s nothing to decide between. You can’t choose between two identical things.
Or to be more charitable, I should say that what seems to have happened here is that I was using the term “sensory pattern” to refer to any and all subjective experiences appearing on one’s visual field, etc., whereas you seem to be using the phrase “makes no sensory difference” to refer to the subset of subjective experience we call ‘the real world’.
True, if I’ve never been to Xela, the well-being of the people there (presumably) makes no difference to my experience of everyday things in the outside world, such as the people I know, or what’s going on in the places I do go. But this is not a problem. Mention the place, and explain the conditions in detail, employing colorful language and eloquent description, and before long there will be a video playing in my mind, apt to make me happy or sad, depending on the well-being of the people therein.
I don’t see the contradiction. Unless I’m missing something in my interpretation of your example, all that must be said is that the experience was enjoyable because certain dangers didn’t play out, such as getting injured or being humiliated, but you’d rather not repeat that experience, for you may not be so lucky in the future. Plenty of things are enjoyable unless they go wrong, and are rather apt to go wrong, and thus are candidates for being something one enjoys but would rather not repeat.
For example, let’s say you get lost in the moment, and have unprotected sex. You didn’t have any condoms or anything, but everything else was perfect, so you went for it. You have the time of your life. After the fact you manage to put the dangers out of your mind, and just remember how excellent the experience was. Eventually it becomes clear that no STIs were transmitted, nor is there an unplanned pregnancy. The experience, because nothing went wrong, was excellent. But you decide it was a mistake.
There seems to be a contradiction here, saying that the experience was excellent, but that it was a mistake. But then you realize that the missing piece that makes it seem contradictory is the time factor. Once a certain amount of time passes, if nothing went wrong, one can say conclusively that nothing went wrong. 100% chance it was awesome and nothing went wrong. But at the time of the event, the odds were much worse. That’s all.
This seems off topic. Decide what? I thought we were talking about how to discover one’s terminal values as a human.
Well if that’s the case then they’re unhelpful definitions. As far as I can see, nothing in my post would suggest a theory weak enough to output something like ‘do meth’, or ‘figure out how to wirehead’.
Along with what I just posted, I should also mention that I did say these two lines:
Those seem to suggest I wasn’t being as naive as your reply seems to imply.