I think you maybe have some confusions along the lines I was discussing here:
I claim that maybe there’s a map-territory confusion going on. In particular, here are two possible situations:
(A) Part of the AGI algorithm involves listing out multiple plans, and another part of the algorithm involves a “grader” that grades the plans.
(B) Same as (A), but also assume that the high-scoring plans involve a world-model (“map”), and somewhere on that map is an explicit (metacognitive / reflective) representation of the “grader” itself, and the (represented) grader’s (represented) grade outputs (within the map) are identical to (or at least close to) the actual grader’s actual grades within the territory.
[I wasn’t sure when I first wrote that comment, but Alex Turner clarified that he was exclusively talking about (B) not (A) when he said “Don’t align agents to evaluations of plans” and such.]
The brain algorithm fits (A) (or so I claim), but that’s compatible with either (B) or (not-B), depending on what happens during training etc.
I think you maybe have some confusions along the lines I was discussing here:
[I wasn’t sure when I first wrote that comment, but Alex Turner clarified that he was exclusively talking about (B) not (A) when he said “Don’t align agents to evaluations of plans” and such.]
The brain algorithm fits (A) (or so I claim), but that’s compatible with either (B) or (not-B), depending on what happens during training etc.
Thank you! Then I indeed misunderstood Alex Turner’s claim, and I basically seem to agree with my new understanding.