We see a path to building systems which have values over the real world.
The path he sees has values over internal model, but internal model is perfect AND it is faster than the real world, which stretches it a fair lot if you ask me. It’s not really a path, he’s simply using “sufficiently advanced model is indistinguishable from the real thing”. And we still can’t define what paperclips are if we don’t know the exact model that will be used, as the definition is only meaningful in context of a model.
The objection I have is that it is a: unnecessary to define the values over real world (the alternatives work fine for e.g. finding imaginary cures for imaginary diseases which we make match real diseases), b: very difficult or impossible to define values over the real world, and c: values over real world are necessary for the doomsday scenario. If this can be narrowed down, then there’s precisely the bit of AI architecture that has to be avoided.
We humans are messy creatures. It is very plausible (in light of potential irreducibility of ‘values over real world’) that we value internal states on the model, and we also receive negative reinforcement for model-world inconsistencies (when the model-prediction of the senses does not match the senses), resulting in learned preference not to lose correspondence between model and world, in place of straightforward “I value real paperclips therefore I value having a good model of the world” which looks suspiciously simple and poorly matches the observations (no matter how much you tell yourself you value real paperclips, you may procrastinate).
edit: and if I don’t make my position clear, it looks so because I am opposed to fuzzy ill defined woo where the distinction between models and worlds is poorly defined and the intelligence is a monolithic blob. It’s hard to define an objection to an ill defined idea which always off-shoots some anthropomorphic idea (e.g. wireheading gets replaced with real world goal to have a physical wire in a physical head that is to be kept alive with the wire).
It is very plausible [...] that we value internal states on the model, and we also receive negative reinforcement for model-world inconsistencies [...], resulting in learned preference not to lose correspondence between model and world
Generally correct; we learn to value good models, because they are more useful than bad models. We want rewards, therefore we want to have good models, therefore we are interested in the world out there. (For a reductionist, there must be a mechanism explaining why and how we care about the world.)
Technically, sometimes the most correct model is not the most rewarded model. For example it may be better to believe a lie and be socially rewarded by members of my tribe who share the belief, than to have a true belief that gets me killed by them. There may be other situations, not necessarily social, where the perfect knowledge is out of reach, and a better approximation may be in the “valley of bad rationality”.
it is unnecessary to define the values over real world (the alternatives work fine for e.g. finding imaginary cures for imaginary diseases which we make match real diseases) [...] there’s precisely the bit of AI architecture that has to be avoided.
In other words, make an AI that only cares about what is inside the box, and it will not try to get out of the box.
That assumes that you will feed the AI all the necessary data, and verify that the data is correct and complete, because the AI will be just as happy with any kind of data. If you give an incorrect information to AI, the AI will not care about it, because it has no definition of “incorrect”; even in situations where AI is smarter than you and could have noticed an error that you didn’t notice. In other words, you are responsible for giving AI the correct model, and the AI will not help you with this, because AI does not care about correctness of the model.
You put it backwards.… making AI that cares about truly real stuff as the prime drive is likely impossible and certainly we don’t know how to do that nor need to. edit: i.e. You don’t have to sit and work and work and work and find how to make some positronic mind not care about the real world. You get this by simply omitting some mission-impossible work. Specifying what you want, in some form, is unavoidable.
Regarding verification, you can have the AI search for code that predicts the input data the best, and then if you are falsifying the data the code will include a model of your falsifications.
The path he sees has values over internal model, but internal model is perfect AND it is faster than the real world, which stretches it a fair lot if you ask me. It’s not really a path, he’s simply using “sufficiently advanced model is indistinguishable from the real thing”. And we still can’t define what paperclips are if we don’t know the exact model that will be used, as the definition is only meaningful in context of a model.
The objection I have is that it is a: unnecessary to define the values over real world (the alternatives work fine for e.g. finding imaginary cures for imaginary diseases which we make match real diseases), b: very difficult or impossible to define values over the real world, and c: values over real world are necessary for the doomsday scenario. If this can be narrowed down, then there’s precisely the bit of AI architecture that has to be avoided.
We humans are messy creatures. It is very plausible (in light of potential irreducibility of ‘values over real world’) that we value internal states on the model, and we also receive negative reinforcement for model-world inconsistencies (when the model-prediction of the senses does not match the senses), resulting in learned preference not to lose correspondence between model and world, in place of straightforward “I value real paperclips therefore I value having a good model of the world” which looks suspiciously simple and poorly matches the observations (no matter how much you tell yourself you value real paperclips, you may procrastinate).
edit: and if I don’t make my position clear, it looks so because I am opposed to fuzzy ill defined woo where the distinction between models and worlds is poorly defined and the intelligence is a monolithic blob. It’s hard to define an objection to an ill defined idea which always off-shoots some anthropomorphic idea (e.g. wireheading gets replaced with real world goal to have a physical wire in a physical head that is to be kept alive with the wire).
Generally correct; we learn to value good models, because they are more useful than bad models. We want rewards, therefore we want to have good models, therefore we are interested in the world out there. (For a reductionist, there must be a mechanism explaining why and how we care about the world.)
Technically, sometimes the most correct model is not the most rewarded model. For example it may be better to believe a lie and be socially rewarded by members of my tribe who share the belief, than to have a true belief that gets me killed by them. There may be other situations, not necessarily social, where the perfect knowledge is out of reach, and a better approximation may be in the “valley of bad rationality”.
In other words, make an AI that only cares about what is inside the box, and it will not try to get out of the box.
That assumes that you will feed the AI all the necessary data, and verify that the data is correct and complete, because the AI will be just as happy with any kind of data. If you give an incorrect information to AI, the AI will not care about it, because it has no definition of “incorrect”; even in situations where AI is smarter than you and could have noticed an error that you didn’t notice. In other words, you are responsible for giving AI the correct model, and the AI will not help you with this, because AI does not care about correctness of the model.
You put it backwards.… making AI that cares about truly real stuff as the prime drive is likely impossible and certainly we don’t know how to do that nor need to. edit: i.e. You don’t have to sit and work and work and work and find how to make some positronic mind not care about the real world. You get this by simply omitting some mission-impossible work. Specifying what you want, in some form, is unavoidable.
Regarding verification, you can have the AI search for code that predicts the input data the best, and then if you are falsifying the data the code will include a model of your falsifications.