Person A: If you’re going to program a chess-playing agent, it needs a direct innate intrinsic drive to not lose its queen.
Person B: Nah, if losing one’s queen is generally bad, it can learn that fact from experience, or from thinking through the likely consequences in any particular case.
Person A: No, that’s not good enough. Protecting the queen is really important. Maybe the AI will learn from experience to not lose its queen in some situations, but situations change and then it will not be motivated to protect its queen sufficiently.
Obviously, Person B is correct here, because AlphaZero-chess works well.
To my ears, your claim (that an AI without intrinsic drive to satisfy curiosity cannot learn to update its decisions) is analogous to Person A’s claim (that an AI without intrinsic drive to protect its queen cannot learn to do so).
In other words, if it’s obvious to you that the AI is insufficiently updating its decisions, it would be obvious to the AI as well (once the AI is sufficiently smart and self-aware). And then the AI can correct for that.
Thanks for explaining your views and this had helped me deconfuse myself, when I was replying and thinking: I am now drawing lines wherein curiosity and self-awareness overlaps also making me feel the expansive nature of studying the theoretical alignment, it’s very dense and it’s so easy to drown in information—this discussion made me feel a whack of a baseball bat then survived to write this comment. Moreover, how to get to Person B still requires knowledge of curiosity and its mechanisms, I still err on the side of finding out how it works[1] or gets imbued to intelligent systems (us and AI) - for me this is very relevant to alignment work.
Person A: If you’re going to program a chess-playing agent, it needs a direct innate intrinsic drive to not lose its queen.
Person B: Nah, if losing one’s queen is generally bad, it can learn that fact from experience, or from thinking through the likely consequences in any particular case.
Person A: No, that’s not good enough. Protecting the queen is really important. Maybe the AI will learn from experience to not lose its queen in some situations, but situations change and then it will not be motivated to protect its queen sufficiently.
Obviously, Person B is correct here, because AlphaZero-chess works well.
To my ears, your claim (that an AI without intrinsic drive to satisfy curiosity cannot learn to update its decisions) is analogous to Person A’s claim (that an AI without intrinsic drive to protect its queen cannot learn to do so).
In other words, if it’s obvious to you that the AI is insufficiently updating its decisions, it would be obvious to the AI as well (once the AI is sufficiently smart and self-aware). And then the AI can correct for that.
Thanks for explaining your views and this had helped me deconfuse myself, when I was replying and thinking: I am now drawing lines wherein curiosity and self-awareness overlaps also making me feel the expansive nature of studying the theoretical alignment, it’s very dense and it’s so easy to drown in information—this discussion made me feel a whack of a baseball bat then survived to write this comment. Moreover, how to get to Person B still requires knowledge of curiosity and its mechanisms, I still err on the side of finding out how it works[1] or gets imbued to intelligent systems (us and AI) - for me this is very relevant to alignment work.
I’m speculating a simplified evolutionary cognitive chain in humans: curiosity + survival instincts (including hunger) → intelligence → self-awareness → rationality.