Thank you for the detailed answer! I’ll read Three Worlds Collide.
That brings us to the real argument: why does the moral realist believe this? “What do I think I know, and how do I think I know it?” What causal, physical process resulted in that belief?
I think a world full of people who are always blissed out is better than a world full of people who are always depressed or in pain. I don’t have a complete ordering over world-histories, but I am confident in this single preference, and if someone called this “objective value” or “moral truth” I wouldn’t say they are clearly wrong. In particular, if someone told me that there exists a certain class of AI systems that end up endorsing the same single preference, and that these AI systems are way less biased and more rational than humans, I would find all that plausible. (Again, compare this if you want.)
Now, why do I think this?
I am a human and I am biased by my own emotional system, but I can still try to imagine what would happen if I stopped feeling emotions. I think I would still consider the happy world more valuable than the sad world. Is this a proof that objective value is a thing? Of course not. At the same time, I can imagine also an AI system thinking: “Look, I know various facts about this world. I don’t believe in golden rules written in fire etched into the fabric of reality, or divine commands about what everyone should do, but I know there are some weird things that have conscious experiences and memory, and this seems something valuable in itself. Moreover, I don’t see other sources of value at the moment. I guess I’ll do something about it.” (Taken from this comment)
Look, I know various facts about this world. I don’t believe in golden rules written in fire etched into the fabric of reality, or divine commands about what everyone should do, but I know there are some weird things that have conscious experiences and memory, and this seems something valuable in itself. Moreover, I don’t see other sources of value at the moment. I guess I’ll do something about it.
Why would something which doesn’t already have values be looking for values? Why would conscious experiences and memory “seem valuable” to a system which does not have values already? Seems like having a “source of value” already is a prerequisite to something seeming valuable—otherwise, what would make it seem valuable?
At the very least, we have strong theoretical reasoning models (like Bayesian reasoners, or Bayesian EU maximizers), which definitely do not go looking for values to pursue, or adopt new values.
At the very least, we have strong theoretical reasoning models (like Bayesian reasoners, or Bayesian EU maximizers), which definitely do not go looking for values to pursue, or adopt new values.
This does not imply one cannot build an agent that works according to a different framework. VNM Utility maximization requires a complete ordering of preferences, and does not say anything about where the ordering comes from in the first place. (But maybe your point was just that our current models do not “look for values”)
Why would something which doesn’t already have values be looking for values? Why would conscious experiences and memory “seem valuable” to a system which does not have values already? Seems like having a “source of value” already is a prerequisite to something seeming valuable—otherwise, what would make it seem valuable?
An agent could have a pre-built routine or subagent that has a certain degree of control over what other subagents do—in a sense, it determines what are the “values” of the rest of the system. If this routine looks unbiased / rational / valueless, we have a system that considers some things as valuable (acts to pursue them) without having a pre-value, or at least the pre-value doesn’t look like something that we would consider a value.
We do have real-world examples of things which do not themselves have anything humans would typically consider values, but do determine the values of the rest of some system. Evolution determining human values is a good example: evolution does not itself care about anything, yet it produced human values. Of course, if we just evolve some system, we don’t expect it to robustly end up with Good values—e.g. the Babyeaters (from Three Worlds Collide) are a plausible outcome as well. Just because we have a value-less system which produces values, does not mean that the values produced are Good.
This example generalizes: we have some subsystem which does not itself contain anything we’d consider values. It determines the values of the rest of the system. But then, what reason do we have to expect that the values produced will be Good? The most common reason to believe such a thing is to predict that the subsystem will produce values similar to our own moral intuitions. But if that’s the case, then we’re using our own moral intuitions as the source-of-truth to begin with, which is exactly the opposite of moral realism.
To reiterate: the core issue with this setup is why we expect the value-less subsystem to produce something Good. How could we possibly know that, without using some other source-of-truth about Goodness to figure it out?
“How the physical world works” seems, to me, a plausible source-of-truth. In other words: I consider some features of the environment (e.g. consciousness) as a reason to believe that some AI systems might end up caring about a common set of things, after they’ve spent some time gathering knowledge about the world and reasoning. Our (human) moral intuitions might also be different from this set.
Thank you for the detailed answer! I’ll read Three Worlds Collide.
I think a world full of people who are always blissed out is better than a world full of people who are always depressed or in pain. I don’t have a complete ordering over world-histories, but I am confident in this single preference, and if someone called this “objective value” or “moral truth” I wouldn’t say they are clearly wrong. In particular, if someone told me that there exists a certain class of AI systems that end up endorsing the same single preference, and that these AI systems are way less biased and more rational than humans, I would find all that plausible. (Again, compare this if you want.)
Now, why do I think this?
I am a human and I am biased by my own emotional system, but I can still try to imagine what would happen if I stopped feeling emotions. I think I would still consider the happy world more valuable than the sad world. Is this a proof that objective value is a thing? Of course not. At the same time, I can imagine also an AI system thinking: “Look, I know various facts about this world. I don’t believe in golden rules written in fire etched into the fabric of reality, or divine commands about what everyone should do, but I know there are some weird things that have conscious experiences and memory, and this seems something valuable in itself. Moreover, I don’t see other sources of value at the moment. I guess I’ll do something about it.” (Taken from this comment)
Why would something which doesn’t already have values be looking for values? Why would conscious experiences and memory “seem valuable” to a system which does not have values already? Seems like having a “source of value” already is a prerequisite to something seeming valuable—otherwise, what would make it seem valuable?
At the very least, we have strong theoretical reasoning models (like Bayesian reasoners, or Bayesian EU maximizers), which definitely do not go looking for values to pursue, or adopt new values.
This does not imply one cannot build an agent that works according to a different framework. VNM Utility maximization requires a complete ordering of preferences, and does not say anything about where the ordering comes from in the first place.
(But maybe your point was just that our current models do not “look for values”)
An agent could have a pre-built routine or subagent that has a certain degree of control over what other subagents do—in a sense, it determines what are the “values” of the rest of the system. If this routine looks unbiased / rational / valueless, we have a system that considers some things as valuable (acts to pursue them) without having a pre-value, or at least the pre-value doesn’t look like something that we would consider a value.
We do have real-world examples of things which do not themselves have anything humans would typically consider values, but do determine the values of the rest of some system. Evolution determining human values is a good example: evolution does not itself care about anything, yet it produced human values. Of course, if we just evolve some system, we don’t expect it to robustly end up with Good values—e.g. the Babyeaters (from Three Worlds Collide) are a plausible outcome as well. Just because we have a value-less system which produces values, does not mean that the values produced are Good.
This example generalizes: we have some subsystem which does not itself contain anything we’d consider values. It determines the values of the rest of the system. But then, what reason do we have to expect that the values produced will be Good? The most common reason to believe such a thing is to predict that the subsystem will produce values similar to our own moral intuitions. But if that’s the case, then we’re using our own moral intuitions as the source-of-truth to begin with, which is exactly the opposite of moral realism.
To reiterate: the core issue with this setup is why we expect the value-less subsystem to produce something Good. How could we possibly know that, without using some other source-of-truth about Goodness to figure it out?
“How the physical world works” seems, to me, a plausible source-of-truth. In other words: I consider some features of the environment (e.g. consciousness) as a reason to believe that some AI systems might end up caring about a common set of things, after they’ve spent some time gathering knowledge about the world and reasoning. Our (human) moral intuitions might also be different from this set.