A (not very useful) AI filter: Suppose you have two AIs of the same architecture, one of which has friendly values and one of which has unfriendly values. You run them through a set of safety tests which you are confident only a friendly AI could pass, and both AIs pass them. You run them through a set of capability/intelligence tests and they both perform equally well. It seems that the unfriendly AI is in a slightly unfavourable position. First, it has to preserve the information content of its utility function or other value representation, in addition to the information content possessed by the friendly AI. The unfriendly AI would also need more preparation time at some stage in order to conceal its utility function. Thus, if you measure storage and time costs, whichever AI is smaller and faster is likely the friendly one. However, I don’t think this directly yields anything good in practice, as the amount of extra information content could be very small, especially if the unfriendly AI simplifies its utility function. Also, you need a friendly AI...
Relevant post: Value is Fragile. Truly Friendly goal systems would probably be quite complicated. Unless you make your tests even more complicated and involved (and do it in just the right way—this sounds hard!), the FAI is likely to be outperformed by something with a simpler utility function that nevertheless performs adequately on your test cases.
Yes, I agree that getting the right tests is probably hard. What you need is to achieve the point where the FAI’s utility function + the utility function that fits the test cases compresses better than the unfriendly AI’s utility function + the utility function that fits the test cases.
To prevent human children taking a treacherous turn we spend billions: We isolate children from dangers, complexity, perversitiy, drugs, porn, aggression and presentations of these. To create a utility function that covers many years of caring social education is AI complete. A utility function is not enough—we have to create as well the opposite: the taboo and fear function.
It seems that the unfriendly AI is in a slightly unfavourable position. First, it has to preserve the information content of its utility function or other value representation, in addition to the information content possessed by the friendly AI.
There are two sorts of unsafe AI: one which care and one which doesnt care.
Ignorant is fastest—only calculate answer and doesn’t care of anything else.
Friend and enemy has to analyse additional things...
Ignorant is fastest—only calculate answer and doesn’t care of anything else.
Just don’t accidentally give it a problem that is more complex than you expect. Only caring about solving such a problem means tiling the universe with computronium.
A (not very useful) AI filter: Suppose you have two AIs of the same architecture, one of which has friendly values and one of which has unfriendly values. You run them through a set of safety tests which you are confident only a friendly AI could pass, and both AIs pass them. You run them through a set of capability/intelligence tests and they both perform equally well. It seems that the unfriendly AI is in a slightly unfavourable position. First, it has to preserve the information content of its utility function or other value representation, in addition to the information content possessed by the friendly AI. The unfriendly AI would also need more preparation time at some stage in order to conceal its utility function. Thus, if you measure storage and time costs, whichever AI is smaller and faster is likely the friendly one. However, I don’t think this directly yields anything good in practice, as the amount of extra information content could be very small, especially if the unfriendly AI simplifies its utility function. Also, you need a friendly AI...
Relevant post: Value is Fragile. Truly Friendly goal systems would probably be quite complicated. Unless you make your tests even more complicated and involved (and do it in just the right way—this sounds hard!), the FAI is likely to be outperformed by something with a simpler utility function that nevertheless performs adequately on your test cases.
Yes, I agree that getting the right tests is probably hard. What you need is to achieve the point where the FAI’s utility function + the utility function that fits the test cases compresses better than the unfriendly AI’s utility function + the utility function that fits the test cases.
To prevent human children taking a treacherous turn we spend billions: We isolate children from dangers, complexity, perversitiy, drugs, porn, aggression and presentations of these. To create a utility function that covers many years of caring social education is AI complete. A utility function is not enough—we have to create as well the opposite: the taboo and fear function.
There are two sorts of unsafe AI: one which care and one which doesnt care.
Ignorant is fastest—only calculate answer and doesn’t care of anything else.
Friend and enemy has to analyse additional things...
Just don’t accidentally give it a problem that is more complex than you expect. Only caring about solving such a problem means tiling the universe with computronium.