Very little. I don’t like my odds. If Eliezer has provable friendliness theorems but not an AI, it’s in his and everyone’s interest to distribute the generalized theorem to everyone possible so that anyone working on recursive AGI has a chance to make it friendly, which means the algorithms will be checked by many, publicly. If Eliezer has the theorems and an AI ready to implement, there’s nothing I can do about it at all. So why worry?
You could Trust Eliezer and everyone who checked his theorems and then put money towards the supercomputer cluster which is testing every string of bits for ‘friendly AI’.
In fact, at that point that project should become humanity’s only notable priority.
You could Trust Eliezer and everyone who checked his theorems and then put money towards the supercomputer cluster which is testing every string of bits for ‘friendly AI’.
In fact, at that point that project should become humanity’s only notable priority.
Probably not. That project does not sound efficient (to the extent that it sounds unfeasible).
Does the expected return on “this has a one in 2^(10^10) chance of generating a proven friendly AI” lose to anything else in terms of expected result? If so, we should be doing that to the exclusion of FAI research already.
Does the expected return on “this has a one in 2^(10^10) chance of generating a proven friendly AI” lose to anything else in terms of expected result?
Yes. It loses to any sane strategy of finding an FAI that is not “testing every string of bits”.
Testing every string of bits is a nice ‘proof of concept’ translation of “can prove a string of bits is an FAI if it in fact is an FAI” to “actually have an FAI”. It just isn’t one that works in practice unless, say, you have a halting oracle in your pocket.
I write a program which recognizes AGI and halts when it finds one. It tests all programs less than 1 GB. If it halts, I narrow the search space. If it doesn’t, I expand the search space. Each search takes the time used by the halting oracle.
We already have a way to prove that a string of bits is FAI; it is trivial to generalize that to AI.
“I have a string of bits. I do not know if it is FAI or not, but I have a way of testing that is perfectly specific and very sensitive. The odds of an AI researcher writing a FAI are vanishingly small. Should I test it if it was written by an AI researcher? Should I test it if it isn’t?
We already have a way to prove that a string of bits is FAI; it is trivial to generalize that to AI.
Yes, if you have that proof and a halting oracle then you have AGI. That is vastly different to the claim “If I had a halting oracle I would have an AGI”.
That proof was a prerequisite for the idea to try the brute force method of finding the FAI.
Someone could also set an end condition that only an AGI is likely to ever reach and use iterated UTMs to find a program that reached the desired end condition. I’m not sure how to a priori figure out what someone smarter than me would do and test for it, but it isn’t the first thing I would do with a constant-time halting oracle. The first thing I would do is test a program that found additional implications of first order logic and halted when it found a contradiction. Then I would add the inverse of the Gödel statement and test that program.
Very little. I don’t like my odds. If Eliezer has provable friendliness theorems but not an AI, it’s in his and everyone’s interest to distribute the generalized theorem to everyone possible so that anyone working on recursive AGI has a chance to make it friendly, which means the algorithms will be checked by many, publicly. If Eliezer has the theorems and an AI ready to implement, there’s nothing I can do about it at all. So why worry?
You could Trust Eliezer and everyone who checked his theorems and then put money towards the supercomputer cluster which is testing every string of bits for ‘friendly AI’.
In fact, at that point that project should become humanity’s only notable priority.
Probably not. That project does not sound efficient (to the extent that it sounds unfeasible).
Does the expected return on “this has a one in 2^(10^10) chance of generating a proven friendly AI” lose to anything else in terms of expected result? If so, we should be doing that to the exclusion of FAI research already.
Yes. It loses to any sane strategy of finding an FAI that is not “testing every string of bits”.
Testing every string of bits is a nice ‘proof of concept’ translation of “can prove a string of bits is an FAI if it in fact is an FAI” to “actually have an FAI”. It just isn’t one that works in practice unless, say, you have a halting oracle in your pocket.
If I had a halting oracle, I would have AGI.
By your logic, testing an actual AI against the proofs of friendliness isn’t worth the expense, because there might be an inconclusive result.
No you wouldn’t.
No, my logic doesn’t claim that either.
I write a program which recognizes AGI and halts when it finds one. It tests all programs less than 1 GB. If it halts, I narrow the search space. If it doesn’t, I expand the search space. Each search takes the time used by the halting oracle.
We already have a way to prove that a string of bits is FAI; it is trivial to generalize that to AI.
“I have a string of bits. I do not know if it is FAI or not, but I have a way of testing that is perfectly specific and very sensitive. The odds of an AI researcher writing a FAI are vanishingly small. Should I test it if it was written by an AI researcher? Should I test it if it isn’t?
Yes, if you have that proof and a halting oracle then you have AGI. That is vastly different to the claim “If I had a halting oracle I would have an AGI”.
That proof was a prerequisite for the idea to try the brute force method of finding the FAI.
Someone could also set an end condition that only an AGI is likely to ever reach and use iterated UTMs to find a program that reached the desired end condition. I’m not sure how to a priori figure out what someone smarter than me would do and test for it, but it isn’t the first thing I would do with a constant-time halting oracle. The first thing I would do is test a program that found additional implications of first order logic and halted when it found a contradiction. Then I would add the inverse of the Gödel statement and test that program.