That is a bit of an old chestnut around here. It is like saying “the rule” for computer software is to crash or go into an infinite loop. If you actually look at the computer software available, it behaves quite differently. Expecting the real world to present you with a random sample from the theoretically-possible options often doesn’t make any sense at all.
Of course it doesn’t make sense, but that isn’t the argument.
Most computer programs more or less work after a lot of time is spent debugging. The problem is that once it is bug free enough to get into the subspace of mind designs that are capable of ‘FOOM’, then it has to work exactly on the first try. Keep in mind that mind design space itself is a small target surrounded by a bunch of crash/infinite loops.
The idea isn’t that we’d be throwing darts with a necessarily uniform distribution over the dartboard- and that we better quit forever because the eternal payoff calculation comes out negative. The idea is that if an inner bullseye wins big, but an outer bulls eye kills everybody, you don’t play until you’re really really really good.
The way software development usually works is with lots of testing. You use a test harness to restrain the program—and then put it through its paces.
The idea that we won’t be able to do that with machine intelligence seems like one of the more screwed-up ideas to come out of the SIAI to me.
The most often-cited justification is the AI box experiments—which are cited as evidence that you can’t safely restrain a machine intelligence—since it will find a way to escape.
This does not seem like a credible position to me. You don’t build your test harness out of humans. The AI box experiments seem to have low relevance to this problem to me.
The forces on the outside will include many humans and machines. They will together be able to construct pretty formidable prisions with configurable safety levels.
Obviously, we would need to avoid permanent setbacks—but apart from those we don’t really have to “get it right first time”. Many possible problems can be recovered from. Also, it doesn’t mean that we won’t be able to test and rehearse. We will be able to do those things.
Test harnesses might turn out to be very useful, but this isn’t a trivial task, and I don’t think the development and use of such harnesses can be taken for granted. It’s not just that it must be safely contained, but that it also has to be able to interact with the outside world in a manner that can’t be dangerous, but is still informative enough to decide whether its friendly- this seems hard.
The original subject of disagreement was “is AI failure the rule or exception?”. This isn’t a precisely specified question, but it just seemed like you were arguing that the “most minds are unfriendly” argument is not important because it is either irrelevant or universally understood and accounted for. I think that this argument is not universally understood among those that might design an AI and that failure to understand this would also result in the AI not being placed in a suitably secure test harness.
It’s not just that it must be safely contained, but that it also has to be able to interact with the outside world in a manner that can’t be dangerous, but is still informative enough to decide whether its friendly- this seems hard.
Restore it to factory settings between applications of the test suite.
Not remembering what your actions were should make it “pretty tricky” to link those actions to their consequences.
Making the prisons is the more challenging part of the problem—IMO.
Obviously, we would need to avoid permanent setbacks—but apart from those we don’t really have to “get it right first time”. Many possible problems can be recovered from. Also, it doesn’t mean that we won’t be able to test and rehearse. We will be able to do those things.
That is, you agree with them that there must be 0 unrecoverable errors, but you think the set of errors that are unrecoverable is much smaller than they do?
Of course it doesn’t make sense, but that isn’t the argument.
Most computer programs more or less work after a lot of time is spent debugging. The problem is that once it is bug free enough to get into the subspace of mind designs that are capable of ‘FOOM’, then it has to work exactly on the first try. Keep in mind that mind design space itself is a small target surrounded by a bunch of crash/infinite loops.
The idea isn’t that we’d be throwing darts with a necessarily uniform distribution over the dartboard- and that we better quit forever because the eternal payoff calculation comes out negative. The idea is that if an inner bullseye wins big, but an outer bulls eye kills everybody, you don’t play until you’re really really really good.
The way software development usually works is with lots of testing. You use a test harness to restrain the program—and then put it through its paces.
The idea that we won’t be able to do that with machine intelligence seems like one of the more screwed-up ideas to come out of the SIAI to me.
The most often-cited justification is the AI box experiments—which are cited as evidence that you can’t safely restrain a machine intelligence—since it will find a way to escape.
This does not seem like a credible position to me. You don’t build your test harness out of humans. The AI box experiments seem to have low relevance to this problem to me.
The forces on the outside will include many humans and machines. They will together be able to construct pretty formidable prisions with configurable safety levels.
Obviously, we would need to avoid permanent setbacks—but apart from those we don’t really have to “get it right first time”. Many possible problems can be recovered from. Also, it doesn’t mean that we won’t be able to test and rehearse. We will be able to do those things.
Test harnesses might turn out to be very useful, but this isn’t a trivial task, and I don’t think the development and use of such harnesses can be taken for granted. It’s not just that it must be safely contained, but that it also has to be able to interact with the outside world in a manner that can’t be dangerous, but is still informative enough to decide whether its friendly- this seems hard.
The original subject of disagreement was “is AI failure the rule or exception?”. This isn’t a precisely specified question, but it just seemed like you were arguing that the “most minds are unfriendly” argument is not important because it is either irrelevant or universally understood and accounted for. I think that this argument is not universally understood among those that might design an AI and that failure to understand this would also result in the AI not being placed in a suitably secure test harness.
Restore it to factory settings between applications of the test suite.
Not remembering what your actions were should make it “pretty tricky” to link those actions to their consequences.
Making the prisons is the more challenging part of the problem—IMO.
That is, you agree with them that there must be 0 unrecoverable errors, but you think the set of errors that are unrecoverable is much smaller than they do?
I think we will be able to practice in a sandbox. Not practising makes no sense to me.
Or at least, unless you are better than the average player capable of hitting the board.