Regarding direction 17:
There might be some potential drawbacks to ADAM.
I think its possible that some very agentic programs have relatively low g score.
This is due to explicit optimization algorithms being low complexity.
(Disclaimer: the following argument is not a proof, and appeals to some heuristics/etc.
We fix M=M0 for these considerations too.)
Consider an utility function ^U.
Further, consider a computable approximation of the optimal policy (AIXI that explicitly optimizes for ^U) and has an approximation parameter n
(this could be AIXI-tl, plus some approximation of ^U; higher n is better approximation).
We will call this approximation of the optimal policy π^Un.
This approximation algorithm has complexity K(π^Un)=C+K(^U)+K(n),
where C>0 is a constant needed to describe the general algorithm (this should not be too large).
We can get better approximation by using a quickly growing function, such as the Ackermann function
with n=A(k,k).
Then we have K(π^UA(k,k))=C+K(^U)+K(A(k,k))≤C+K(^U)+log(k).
What is the g score of this policy?
We have g(π^UA(k,k))=maxU(minπ′:…K(π′)−K(U)).
Let ¯U be maximal in this expression.
If K(¯U)≥K(^U)−C, then
g(π^UA(k,k))=minπ′:EζM0π′(¯U)≥EζM0π^UA(k,k)(¯U)K(π′)−K(¯U)≤K(π^UA(k,k))−K(^U)+C≤2Clog(k).
For the other case, let us assume that if K(¯U)<K(^U)−C,
the policy π¯UA(k,k) is at least as good at maximizing ¯U than π^U)A(k,k).
Then, we have
g(π^UA(k,k))=minπ′:EζM0π′(¯U)≥EζM0π^UA(k,k)(¯U)K(π′)−K(¯U)≤K(π¯UA(k,k))−K(¯U))≤C+log(k).
I don’t think that the assumption
((π¯UA(k,k) maximizes barU better than (π^UA(k,k))
is true for all ^U and k, but plausibly we can select ^U such that this is the case
(exceptions, if they exist, would be a bit weird, and if ADAM working well due to these weird exceptions feels a bit disappointing to me).
A thing that is not captured by approximations such as AIXI-tl are programs that halt but have insane runtime (longer than A(k,k)).
Again, it would feel weird to me if ADAM sort of works because of low-complexity extremely-long-running halting programs.
To summarize, maybe there exist policies π^UA(k,k)
which strongly optimize a non-trivial utility function ^U
with approximation parameter n=A(k,k), but where g(π^UA(k,k))≤2C+log(k)
is relatively small.
Yes, this is an important point, of which I am well aware. This is why I expect unbounded-ADAM to only be a toy model. A more realistic ADAM would use a complexity measure that takes computational complexity into account instead of K. For example, you can look at the measure C I defined here. More realistically, this measure should be based on the frugal universal prior.
Regarding direction 17: There might be some potential drawbacks to ADAM. I think its possible that some very agentic programs have relatively low g score. This is due to explicit optimization algorithms being low complexity.
(Disclaimer: the following argument is not a proof, and appeals to some heuristics/etc. We fix M=M0 for these considerations too.) Consider an utility function ^U. Further, consider a computable approximation of the optimal policy (AIXI that explicitly optimizes for ^U) and has an approximation parameter n (this could be AIXI-tl, plus some approximation of ^U; higher n is better approximation). We will call this approximation of the optimal policy π^Un. This approximation algorithm has complexity K(π^Un)=C+K(^U)+K(n), where C>0 is a constant needed to describe the general algorithm (this should not be too large).
We can get better approximation by using a quickly growing function, such as the Ackermann function with n=A(k,k). Then we have K(π^UA(k,k))=C+K(^U)+K(A(k,k))≤C+K(^U)+log(k).
What is the g score of this policy? We have g(π^UA(k,k))=maxU(minπ′:…K(π′)−K(U)). Let ¯U be maximal in this expression. If K(¯U)≥K(^U)−C, then g(π^UA(k,k))=minπ′:EζM0π′(¯U)≥EζM0π^UA(k,k)(¯U)K(π′)−K(¯U)≤K(π^UA(k,k))−K(^U)+C≤2Clog(k).
For the other case, let us assume that if K(¯U)<K(^U)−C, the policy π¯UA(k,k) is at least as good at maximizing ¯U than π^U)A(k,k). Then, we have g(π^UA(k,k))=minπ′:EζM0π′(¯U)≥EζM0π^UA(k,k)(¯U)K(π′)−K(¯U)≤K(π¯UA(k,k))−K(¯U))≤C+log(k).
I don’t think that the assumption ((π¯UA(k,k) maximizes barU better than (π^UA(k,k)) is true for all ^U and k, but plausibly we can select ^U such that this is the case (exceptions, if they exist, would be a bit weird, and if ADAM working well due to these weird exceptions feels a bit disappointing to me). A thing that is not captured by approximations such as AIXI-tl are programs that halt but have insane runtime (longer than A(k,k)). Again, it would feel weird to me if ADAM sort of works because of low-complexity extremely-long-running halting programs.
To summarize, maybe there exist policies π^UA(k,k) which strongly optimize a non-trivial utility function ^U with approximation parameter n=A(k,k), but where g(π^UA(k,k))≤2C+log(k) is relatively small.
Yes, this is an important point, of which I am well aware. This is why I expect unbounded-ADAM to only be a toy model. A more realistic ADAM would use a complexity measure that takes computational complexity into account instead of K. For example, you can look at the measure C I defined here. More realistically, this measure should be based on the frugal universal prior.