I would be flattered, had your comment be a compliment. ☺
What I meant is that we have a system with a self-correcting world model which solves the “finger pointing at the Moon” problem. It optimizes the world according to its beliefs about the Moon, even though all we could give it was the finger.
To be clear, I don’t necessarily think you’re wrong about how bio brains do it. A lot rests on the word “reliably”. One possible explanation for sexual fetishes is that the human biological mechanism for pointing at sexual partners is quite unreliable (a hypothesis I predict you agree with).
But if we could get a similar mechanism to work reliably, we’d have a mechanism for pointing learning machines at things in the world.
(I note that if this part works out reliably, alignment would essentially be solved.)
I would be flattered, had your comment be a compliment. ☺
What I meant is that we have a system with a self-correcting world model which solves the “finger pointing at the Moon” problem. It optimizes the world according to its beliefs about the Moon, even though all we could give it was the finger.
To be clear, I don’t necessarily think you’re wrong about how bio brains do it. A lot rests on the word “reliably”. One possible explanation for sexual fetishes is that the human biological mechanism for pointing at sexual partners is quite unreliable (a hypothesis I predict you agree with).
But if we could get a similar mechanism to work reliably, we’d have a mechanism for pointing learning machines at things in the world.