At the very least, we have strong theoretical reasoning models (like Bayesian reasoners, or Bayesian EU maximizers), which definitely do not go looking for values to pursue, or adopt new values.
This does not imply one cannot build an agent that works according to a different framework. VNM Utility maximization requires a complete ordering of preferences, and does not say anything about where the ordering comes from in the first place. (But maybe your point was just that our current models do not “look for values”)
Why would something which doesn’t already have values be looking for values? Why would conscious experiences and memory “seem valuable” to a system which does not have values already? Seems like having a “source of value” already is a prerequisite to something seeming valuable—otherwise, what would make it seem valuable?
An agent could have a pre-built routine or subagent that has a certain degree of control over what other subagents do—in a sense, it determines what are the “values” of the rest of the system. If this routine looks unbiased / rational / valueless, we have a system that considers some things as valuable (acts to pursue them) without having a pre-value, or at least the pre-value doesn’t look like something that we would consider a value.
We do have real-world examples of things which do not themselves have anything humans would typically consider values, but do determine the values of the rest of some system. Evolution determining human values is a good example: evolution does not itself care about anything, yet it produced human values. Of course, if we just evolve some system, we don’t expect it to robustly end up with Good values—e.g. the Babyeaters (from Three Worlds Collide) are a plausible outcome as well. Just because we have a value-less system which produces values, does not mean that the values produced are Good.
This example generalizes: we have some subsystem which does not itself contain anything we’d consider values. It determines the values of the rest of the system. But then, what reason do we have to expect that the values produced will be Good? The most common reason to believe such a thing is to predict that the subsystem will produce values similar to our own moral intuitions. But if that’s the case, then we’re using our own moral intuitions as the source-of-truth to begin with, which is exactly the opposite of moral realism.
To reiterate: the core issue with this setup is why we expect the value-less subsystem to produce something Good. How could we possibly know that, without using some other source-of-truth about Goodness to figure it out?
“How the physical world works” seems, to me, a plausible source-of-truth. In other words: I consider some features of the environment (e.g. consciousness) as a reason to believe that some AI systems might end up caring about a common set of things, after they’ve spent some time gathering knowledge about the world and reasoning. Our (human) moral intuitions might also be different from this set.
This does not imply one cannot build an agent that works according to a different framework. VNM Utility maximization requires a complete ordering of preferences, and does not say anything about where the ordering comes from in the first place.
(But maybe your point was just that our current models do not “look for values”)
An agent could have a pre-built routine or subagent that has a certain degree of control over what other subagents do—in a sense, it determines what are the “values” of the rest of the system. If this routine looks unbiased / rational / valueless, we have a system that considers some things as valuable (acts to pursue them) without having a pre-value, or at least the pre-value doesn’t look like something that we would consider a value.
We do have real-world examples of things which do not themselves have anything humans would typically consider values, but do determine the values of the rest of some system. Evolution determining human values is a good example: evolution does not itself care about anything, yet it produced human values. Of course, if we just evolve some system, we don’t expect it to robustly end up with Good values—e.g. the Babyeaters (from Three Worlds Collide) are a plausible outcome as well. Just because we have a value-less system which produces values, does not mean that the values produced are Good.
This example generalizes: we have some subsystem which does not itself contain anything we’d consider values. It determines the values of the rest of the system. But then, what reason do we have to expect that the values produced will be Good? The most common reason to believe such a thing is to predict that the subsystem will produce values similar to our own moral intuitions. But if that’s the case, then we’re using our own moral intuitions as the source-of-truth to begin with, which is exactly the opposite of moral realism.
To reiterate: the core issue with this setup is why we expect the value-less subsystem to produce something Good. How could we possibly know that, without using some other source-of-truth about Goodness to figure it out?
“How the physical world works” seems, to me, a plausible source-of-truth. In other words: I consider some features of the environment (e.g. consciousness) as a reason to believe that some AI systems might end up caring about a common set of things, after they’ve spent some time gathering knowledge about the world and reasoning. Our (human) moral intuitions might also be different from this set.