I think any sufficiently rigorous insight that allows you to build a satisficer of some property will also allow you (or someone else who knows your insight) to build a maximizer of the same property, so research into satisficers doesn’t seem to be safe from a FAI point of view.
any sufficiently rigorous insight that allows you to build a satisficer of some property will also allow you (or someone else who knows your insight) to build a maximizer of the same property
I’m sure this is wrong. There are cases where provably good heuristics are known, and precise solutions are believed intractable. Traveling salesman comes to mind. It follows necessarily that there are tasks that can be done “well enough”, but not perfectly.
You’re right, of course. My comment was wrong and I should’ve used some other word (perhaps “optimizer”) in place of “maximizer”, because I actually wanted to make a slightly different point…
Imagine someone comes up with a rigorous way to write a program that, when run on any computer anywhere, inspects the surrounding universe and then manipulates it to somehow cause the production of 100 paperclips. This requires insight that we don’t have yet, but it seems to me that any such insight should be easy to weaponize (e.g. ask it to produce a trillion paperclips instead of 100) once it’s discovered. It seems weird to hope that 100 paperclips will be a tractable problem but a trillion would be intractable. That would require an amazing accidental correspondence between “tractable” and “safe”.
Ah, you meant satisficer in this sense of the word. I meant to use it in the sense of what type of system humans are. From the variety of goals we pursue we are clearly not maximizer or satisficers of any external property of the universe. People regularly avoid reproducing for example and don’t choose actions that might maximise it even when they do reproduce (e.g. not donating to sperm banks).
“The type of system humans are” has a big disadvantage compared to mathematically simpler systems like maximizers: it seems harder to reason about such “fuzzy” systems, e.g. prove their safety. How do you convince yourself that a “fuzzy” superintelligence is safe to run?
We have an existence proof of intelligences based upon “The type of systems humans are”, we don’t for pure maximizers. It is no good trying to develop friendliness theory based upon a pure easily reasoned about system if you can’t make an intelligence out of it.
So while it is harder, this may be the sort of system we have to deal with. It is these sorts of questions I wanted to try to answer with the group in my original post.
I’ll try to explain why I am sceptical of maximizer based intelligences in a discussion post. It is not because they are inhuman.
In practice, maximizers are not things that actually find the maximum value. They are typically hill-climbers, of some kind or another. They try and find better values—i.e. local maxima. Maximizers do not have to be perfect to warrant the name.
I’d build satisficers for theoretical reasons, not safety ones. Maximizers to me have problems with modifying/improving the model of the world that they are trying to maximize*. Satisficers don’t tend to use models of the world at the lowest level, instead they get proximate signals for the things they are supposed to me maximizing (e.g. dopamine for evolutionary fitness in animal biology) and have to build models of the world that are good at getting the signals. But they really don’t want to maximise those signals because they are not want they actually are supposed to maximise.
Every time I try to say more than this I lapse into a big long post. I’ll see if I can marshal my thoughts somewhat.
*Things like AIXI don’t have this problem because they don’t have to decide how to best to modify their model as they keep all possible models in mind at once. Which is one reason I don’t think it is a good guide for AI.
I think any sufficiently rigorous insight that allows you to build a satisficer of some property will also allow you (or someone else who knows your insight) to build a maximizer of the same property, so research into satisficers doesn’t seem to be safe from a FAI point of view.
I’m sure this is wrong. There are cases where provably good heuristics are known, and precise solutions are believed intractable. Traveling salesman comes to mind. It follows necessarily that there are tasks that can be done “well enough”, but not perfectly.
You’re right, of course. My comment was wrong and I should’ve used some other word (perhaps “optimizer”) in place of “maximizer”, because I actually wanted to make a slightly different point…
Imagine someone comes up with a rigorous way to write a program that, when run on any computer anywhere, inspects the surrounding universe and then manipulates it to somehow cause the production of 100 paperclips. This requires insight that we don’t have yet, but it seems to me that any such insight should be easy to weaponize (e.g. ask it to produce a trillion paperclips instead of 100) once it’s discovered. It seems weird to hope that 100 paperclips will be a tractable problem but a trillion would be intractable. That would require an amazing accidental correspondence between “tractable” and “safe”.
Ah, you meant satisficer in this sense of the word. I meant to use it in the sense of what type of system humans are. From the variety of goals we pursue we are clearly not maximizer or satisficers of any external property of the universe. People regularly avoid reproducing for example and don’t choose actions that might maximise it even when they do reproduce (e.g. not donating to sperm banks).
“The type of system humans are” has a big disadvantage compared to mathematically simpler systems like maximizers: it seems harder to reason about such “fuzzy” systems, e.g. prove their safety. How do you convince yourself that a “fuzzy” superintelligence is safe to run?
We have an existence proof of intelligences based upon “The type of systems humans are”, we don’t for pure maximizers. It is no good trying to develop friendliness theory based upon a pure easily reasoned about system if you can’t make an intelligence out of it.
So while it is harder, this may be the sort of system we have to deal with. It is these sorts of questions I wanted to try to answer with the group in my original post.
I’ll try to explain why I am sceptical of maximizer based intelligences in a discussion post. It is not because they are inhuman.
Well, it’s also hard to prove the safety of maximizers. Proving the danger, on the other hand...
In practice, maximizers are not things that actually find the maximum value. They are typically hill-climbers, of some kind or another. They try and find better values—i.e. local maxima. Maximizers do not have to be perfect to warrant the name.
I’d build satisficers for theoretical reasons, not safety ones. Maximizers to me have problems with modifying/improving the model of the world that they are trying to maximize*. Satisficers don’t tend to use models of the world at the lowest level, instead they get proximate signals for the things they are supposed to me maximizing (e.g. dopamine for evolutionary fitness in animal biology) and have to build models of the world that are good at getting the signals. But they really don’t want to maximise those signals because they are not want they actually are supposed to maximise.
Every time I try to say more than this I lapse into a big long post. I’ll see if I can marshal my thoughts somewhat.
*Things like AIXI don’t have this problem because they don’t have to decide how to best to modify their model as they keep all possible models in mind at once. Which is one reason I don’t think it is a good guide for AI.
Failing to do research on safety grounds isn’t going to help very much either—that just means that another team will make the progress instead.