It seems to me that if you are unable to build friendliness, it is due to not sufficiently understanding friendliness. If you don’t sufficiently understand it, I do not want you building a program which is the check on the release of an AI.
It seems very plausible to me that certifying friendly source code (while still hugely difficult) is much easier than finding friendly source code. For example, maybe we will develop a complicated system S of equations such that, provably, any solution to S encodes the source of an FAI. While finding a solution to S might be intractably difficult for us, verifying a solution x that has been handed to us would be easy—just plug x into S and see if x solves the equations.
ETA: The difficult part, obviously, is developing the system S in the first place. But that would be strictly easier than additionally finding a solution to S on our own.
Even if P!=NP, there still remains the question of whether friendliness is one of those things that we will be able to certify before we can construct an FAI from scratch, but after we can build a uFAI. Eliezer doesn’t think so.
It seems very plausible to me that certifying friendly source code (while still hugely difficult) is much easier than finding friendly source code. For example, maybe we will develop a complicated system S of equations such that, provably, any solution to S encodes the source of an FAI. While finding a solution to S might be intractably difficult for us, verifying a solution x that has been handed to us would be easy—just plug x into S and see if x solves the equations.
ETA: The difficult part, obviously, is developing the system S in the first place. But that would be strictly easier than additionally finding a solution to S on our own.
It looks like you’re just arguing about P=NP, no?
Even if P!=NP, there still remains the question of whether friendliness is one of those things that we will be able to certify before we can construct an FAI from scratch, but after we can build a uFAI. Eliezer doesn’t think so.