Great post! This makes me think of the problem of specification in formal methods: what you managed to formalize is not necessarily what you wanted to formalize. This is why certified software is only as good as the specification that was used for this certification. And that’s one of my main intuitions about the issues of AI safety.
One part of the problem of specification is probably about interfacing, like you write, between the maths of the real world and the maths we can understand/certify. But one thing I feel was not mentioned here is the issue of what I would call unknown ambiguity. One of the biggest difficulties of proving properties of programs and algorithms, is that many parts of the behavior are considered obvious by the designer. Think something like the number of processes cannot be 0, or this variable will never take this value, even if its of the right type. Most of the times, when you add these obvious parts, you can finish the proof. But sometimes the “trivial” was hiding the real problem, which breaks the whole thing.
So I think another scarce resource are people that can explicit all the bits in the system. People that can go to all the nitpick, and rebuild everything from scratch.
Great post! This makes me think of the problem of specification in formal methods: what you managed to formalize is not necessarily what you wanted to formalize. This is why certified software is only as good as the specification that was used for this certification. And that’s one of my main intuitions about the issues of AI safety.
One part of the problem of specification is probably about interfacing, like you write, between the maths of the real world and the maths we can understand/certify. But one thing I feel was not mentioned here is the issue of what I would call unknown ambiguity. One of the biggest difficulties of proving properties of programs and algorithms, is that many parts of the behavior are considered obvious by the designer. Think something like the number of processes cannot be 0, or this variable will never take this value, even if its of the right type. Most of the times, when you add these obvious parts, you can finish the proof. But sometimes the “trivial” was hiding the real problem, which breaks the whole thing.
So I think another scarce resource are people that can explicit all the bits in the system. People that can go to all the nitpick, and rebuild everything from scratch.