Congrats to Eliezer and Marcello on the writeup! It has helped me understand Benja’s “parametric polymorphism” idea better.
There’s a slightly different angle that worries me. What happens if you ask an AI to solve the AI reflection problem?
1) If an agent A_1 generates another agent A_0 by consequentialist reasoning, possibly using proofs in PA, then future descendants of A_0 also count as consequences. So at least A_0 should not face the problem of “telomere shortening”, because PA can see the possible consequences of “telomere shortening” already. But what will A_0 look like? That’s a mystery.
2) To figure out the answer to (1), it’s natural to try devising a toy problem where we could test different implementations of A_1. Benja made a good attempt, then Wei came up with an interesting quining solution to that. Eliezer has now formalized his objection to quining solutions as the “Vingean principle” (no, actually “naturalistic principle”, thx Eliezer), which is a really nice step. Now I just want a toy problem where we’re forced to apply that principle :-) Why such problems are hard to devise is another mystery.
(Quick note: Wei’s quining violates the naturalistic principle, not the Vingean principle. Wei’s actions were still inside quantifiers but had separate forms for self-modification and action. So did Benja’s original proposal in the Quirrell game, which Wei modified—I was surprised and impressed when Benja’s polymorphism approach carried over to a naturalistic system.)
Was it UDT1.1 (as a solution to this problem) that violates the Vingean principle?
Also, I’m wondering if Benja’s polymorphism approach solves the “can’t decide whether or not to commit suicide” problem that I described here. Your paper doesn’t seem to address this problem since the criteria of action you use all talk about “NULL or GOAL” and since suicide leads to NULL, an AI using your criterion of action has trouble deciding whether or not to commit suicide for an even more immediate reason. Do you have any ideas how your framework might be changed to allow this problem to be addressed?
Was it UDT1.1 (as a solution to this problem) that violates the Vingean principle?
As I remarked in that thread, there are many possible designs that violate the Vingean principle, AFAICT UDT 1.1 is one of them.
Also, I’m wondering if Benja’s polymorphism approach solves the “can’t decide whether or not to commit suicide” problem that I described here. Your paper doesn’t seem to address this problem since the criteria of action you use all talk about “NULL or GOAL” and since suicide leads to NULL, an AI using your criterion of action has trouble deciding whether or not to commit suicide for an even more immediate reason.
Suicide being permitted by the NULL option is a different issue from suicide being mandated by self-distrust. Benja’s TK gets rid of distrust of offspring. Work on reflective/naturalistic trust is ongoing.
Congrats to Eliezer and Marcello on the writeup! It has helped me understand Benja’s “parametric polymorphism” idea better.
There’s a slightly different angle that worries me. What happens if you ask an AI to solve the AI reflection problem?
1) If an agent A_1 generates another agent A_0 by consequentialist reasoning, possibly using proofs in PA, then future descendants of A_0 also count as consequences. So at least A_0 should not face the problem of “telomere shortening”, because PA can see the possible consequences of “telomere shortening” already. But what will A_0 look like? That’s a mystery.
2) To figure out the answer to (1), it’s natural to try devising a toy problem where we could test different implementations of A_1. Benja made a good attempt, then Wei came up with an interesting quining solution to that. Eliezer has now formalized his objection to quining solutions as the “Vingean principle” (no, actually “naturalistic principle”, thx Eliezer), which is a really nice step. Now I just want a toy problem where we’re forced to apply that principle :-) Why such problems are hard to devise is another mystery.
(Quick note: Wei’s quining violates the naturalistic principle, not the Vingean principle. Wei’s actions were still inside quantifiers but had separate forms for self-modification and action. So did Benja’s original proposal in the Quirrell game, which Wei modified—I was surprised and impressed when Benja’s polymorphism approach carried over to a naturalistic system.)
Was it UDT1.1 (as a solution to this problem) that violates the Vingean principle?
Also, I’m wondering if Benja’s polymorphism approach solves the “can’t decide whether or not to commit suicide” problem that I described here. Your paper doesn’t seem to address this problem since the criteria of action you use all talk about “NULL or GOAL” and since suicide leads to NULL, an AI using your criterion of action has trouble deciding whether or not to commit suicide for an even more immediate reason. Do you have any ideas how your framework might be changed to allow this problem to be addressed?
As I remarked in that thread, there are many possible designs that violate the Vingean principle, AFAICT UDT 1.1 is one of them.
Suicide being permitted by the NULL option is a different issue from suicide being mandated by self-distrust. Benja’s TK gets rid of distrust of offspring. Work on reflective/naturalistic trust is ongoing.
Thanks! Corrected.