You are making the standard MIRI assumptions that goals are unupdatable
No, I am arguing that agents with goals generally don’t want to update their goals. Neither I nor MIRI assume goals are unupdatable, actually a major component of MIRI’s research is on how to make sure a self improving AI has stable goals.
and don’t include rationality (non arbitrariness, etc) as a terminal value. (The latter is particularly odd, as Orthogonality implies it).
It is possible to have an agent that terminally values meta properties of its own goal system. Such agents, if they are capable of modifying their goal system, will likely self modify to some self-consistent “attractor” system. This does not mean that all agents will converge on a universal goal system. There are different ways that agents can value meta properties of their own goal system, so there are likely many attractors, and many possible agents don’t have such meta values and will not want to modify their goal systems.
It is possible to have an agent that terminally values meta properties of its own goal system. Such agents, if they are capable of modifying their goal system, will likely self modify to some self-consistent “attractor” system. This does not mean that all agents will converge on a universal goal system.
Who asserted they would? Moral agents can have all sorts of goals, They just have to respect each others values. If Smith wants to be an athlete, and Robinson is a budding writer, that doesn’t mean one of them is immoral.
There are different ways that agents can value meta properties of their own goal system,
Ok. That would be a problem with your suggestion of valuing arbitrary meta properties of their goal system. Then lets go back to my suggestion of valuing rationality.
so there are likely many attractors, and many possible agents don’t have such meta values and will not want to modify their goal systems.
Agents will do what they are built to do. If agents that don’t value rationality are dangerous, build ones that do.
MIRI: “We have detemined that cars without bbrakes are dangerous. We have also determined that the best solution is to reduce the speed limit to 10mph”
Everyone else: “We know cars without brakes are dangerous. That’s why we build them with brakes”.
Who asserted they would? Moral agents can have all sorts of goals, They just have to respect each others values. If Smith wants to be an athlete, and Robinson is a budding writer, that doesn’t mean one of them is immoral.
Have to, or else what? And how do we separate moral agents from agents that are not moral?
Ok. That would be a problem with your suggestion of valuing arbitrary meta properties of their goal system. Then lets go back to my suggestion of valuing rationality.
Agents will do what they are built to do. If agents that don’t value rationality are dangerous, build ones that do.
MIRI: “We have detemined that cars without bbrakes are dangerous. We have also determined that the best solution is to reduce the speed limit to 10mph”
Everyone else: “We know cars without brakes are dangerous. That’s why we build them with brakes”.
If the solution is to build agents that “value rationality,” can you explain how to do that? If it’s something so simple as to be analogous to adding brakes to a car, as opposed to, say, programming the car to be able to drive itself (let alone something much more complicated,) then it shouldn’t be so difficult to describe how to do it.
No, I am arguing that agents with goals generally don’t want to update their goals. Neither I nor MIRI assume goals are unupdatable, actually a major component of MIRI’s research is on how to make sure a self improving AI has stable goals.
It is possible to have an agent that terminally values meta properties of its own goal system. Such agents, if they are capable of modifying their goal system, will likely self modify to some self-consistent “attractor” system. This does not mean that all agents will converge on a universal goal system. There are different ways that agents can value meta properties of their own goal system, so there are likely many attractors, and many possible agents don’t have such meta values and will not want to modify their goal systems.
Who asserted they would? Moral agents can have all sorts of goals, They just have to respect each others values. If Smith wants to be an athlete, and Robinson is a budding writer, that doesn’t mean one of them is immoral.
Ok. That would be a problem with your suggestion of valuing arbitrary meta properties of their goal system. Then lets go back to my suggestion of valuing rationality.
Agents will do what they are built to do. If agents that don’t value rationality are dangerous, build ones that do.
MIRI: “We have detemined that cars without bbrakes are dangerous. We have also determined that the best solution is to reduce the speed limit to 10mph”
Everyone else: “We know cars without brakes are dangerous. That’s why we build them with brakes”.
Have to, or else what? And how do we separate moral agents from agents that are not moral?
Valuing rationality for what? What would an agent which “values rationality” do?
If the solution is to build agents that “value rationality,” can you explain how to do that? If it’s something so simple as to be analogous to adding brakes to a car, as opposed to, say, programming the car to be able to drive itself (let alone something much more complicated,) then it shouldn’t be so difficult to describe how to do it.
Have to, logically. Like even numbers have to be divisible,
How do we recognise anything? They have behaviour and characteristics which match the definition.
For itself. I do not accept that rationality can only be instrumental, a means to an end.
The kind of thing EY, the CFAR and other promoters of rationality urge people to do.
In the same kind of very broad terms that MIRI can explain how to build Artificial Obsessive Compulsives.
The analogy was not about simplicity. Illustrative analogies are always simpler than what they are illustrating: that is where their usefulness lies.