I think evolution-of-humans is kinda like taking a model-based RL algorithm (for within-lifetime learning), and doing a massive outer-loop search over neural architectures, hyperparameters, and also reward functions. In principle (though IMO almost definitely not in practice), humans could likewise do that kind of grand outer-loop search over RL algorithms, and get AGI that way. And if they did, I strongly expect that the resulting AGI would have a “curiosity” term in its reward function, as I think humans do. After all, a curiosity reward-function term is already sometimes used in today’s RL, e.g. the Montezuma’s Revenge literature, and it’s not terribly complicated, and it’s useful, and I think innate-curiosity-drive exists not only in humans but also in much much simpler animals. Maybe there’s more than one way to implement curiosity-drive in detail, but something in that category seems pretty essential for an RL algorithm to train successfully in a complex environment, and I don’t think I’m just over-indexing on what’s familiar.
Again, this is all pretty irrelevant on my models because I don’t expect that people will program AGI by doing a blind outer-loop search over RL reward functions. Rather, I expect that people will write down the RL reward function for AGI in the form of handwritten source code, and that they will put curiosity-drive into that reward function source code (as they already sometimes do), because they will find that it’s essential for capabilities.
Separately, insofar as curiosity-drive is essential for capabilities (as I believe), it doesn’t help alignment, but rather hurts alignment, because it’s bad if an AI wants to satisfy its own curiosity at the expense of things that humans directly value. Hopefully that’s obvious to everyone here, right? Parts of the discussion seemed to be portraying AIs-that-are-curious as a good thing rather than a bad thing, which was confusing to me. I assume I was just failing to follow all the unspoken context?
maintaining uncertainty about the true meaning of an objective is important, but there’s a difference between curiosity about the true values one holds, intrinsic curiosity as a component of a value system, and instrumental curiosity as a consequence of an uncertain planning system. I’m surprised to see disagree from MiguelDev and Noosphere, could either of you expand on what you disagree with?
@the gears to ascension Hello! I just think curiosity is a low level attribute that allows a reaction and it maybe good or bad all things considered, with this regard curiosity (or studying curiosity) may help in alignment as well.
For example, an AI is in a situation that it needs to save someone from a burning house, it should be curious enough to consider all possible options available and eventually if it is aligned—it will choose the actions that will result to good outcomes (after also studying all the bad options). That is why I don’t agree with the idea that it purely hurts alignment as described in the comment.
(I think Nate and Ronny shares important knowledge in this dialogue—low level forces (birthed by evolution) that I think is misunderstood by many.)
Your example is about capabilities (assuming the AI is trying to save me from the fire, will it succeed?) but I was talking about alignment (is the AI trying to save me from the fire in the first place?)
I don’t want the AI to say “On the one hand, I care about Steve’s welfare. On the other hand, man I’m just really curious how people behave when they’re on fire. Like, what do they say? What do they do? So, I feel torn—should I save Steve from the fire or not? Hmm…”
(I agree that, if an AI is aligned, and if it is trying to save me from a burning house, then I would rather that the AI be more capable rather than less capable—i.e., I want the AI to come up with and execute very very good plans.)
As for capabilities, I think curiosity drive is probably essential during early RL training. Once the AI is sufficiently intelligent (including in metacognitive / self-reflective ways), it’s plausible that we could turn curiosity drive off without harming capabilities. After all, it’s possible for an AI to “consider all possible options” not because it’s curious, but rather because it wants me to not die in the fire, and it’s smart enough to know that “considering all possible options” is a very effective means-to-an-end for preventing me from dying in the fire.
Humans can do that too. We don’t only seek information because we’re curious; we can also do it as a means to an end. For example, sometimes I have really wanted to do something, and so then I read an mind-numbingly-boring book that I expect might help me do that thing. Curiosity is not driving me to read the book; on the contrary, curiosity is pushing me away from the book with all its might, because anything else on earth would be more inherently interesting than this boring book. But I read the book anyway, because I really want to do the thing, and I know that reading the book will help. I think an AI which is maximally beneficial to humans would have a similar kind of motivation. Yes it would often brainstorm, and ponder, and explore, and seek information, etc., but it would do all those things not because they are inherently rewarding, but rather because it knows that doing those things is probably useful for what it really wants at the end of the day, which is to benefit humans.
Once the AI is sufficiently intelligent (including in metacognitive / self-reflective ways), it’s plausible that we could turn curiosity drive off without harming capabilities. After all, it’s possible for an AI to “consider all possible options” not because it’s curious, but rather because it wants me to not die in the fire, and it’s smart enough to know that “considering all possible options” is a very effective means-to-an-end for preventing me from dying in the fire.
Interesting view but I have to point out that situations change and there will be many tiny details that will become like a back and forth discussion inside the AI’s network as it performs its tasks and turning off curiosity will most likely end up in the worst outcomes as it my not be able to update its decisions (eg. oops didn’t saw there was a fire hose available or oops I didn’t felt the heat of the floor earlier).
Person A: If you’re going to program a chess-playing agent, it needs a direct innate intrinsic drive to not lose its queen.
Person B: Nah, if losing one’s queen is generally bad, it can learn that fact from experience, or from thinking through the likely consequences in any particular case.
Person A: No, that’s not good enough. Protecting the queen is really important. Maybe the AI will learn from experience to not lose its queen in some situations, but situations change and then it will not be motivated to protect its queen sufficiently.
Obviously, Person B is correct here, because AlphaZero-chess works well.
To my ears, your claim (that an AI without intrinsic drive to satisfy curiosity cannot learn to update its decisions) is analogous to Person A’s claim (that an AI without intrinsic drive to protect its queen cannot learn to do so).
In other words, if it’s obvious to you that the AI is insufficiently updating its decisions, it would be obvious to the AI as well (once the AI is sufficiently smart and self-aware). And then the AI can correct for that.
Thanks for explaining your views and this had helped me deconfuse myself, when I was replying and thinking: I am now drawing lines wherein curiosity and self-awareness overlaps also making me feel the expansive nature of studying the theoretical alignment, it’s very dense and it’s so easy to drown in information—this discussion made me feel a whack of a baseball bat then survived to write this comment. Moreover, how to get to Person B still requires knowledge of curiosity and its mechanisms, I still err on the side of finding out how it works[1] or gets imbued to intelligent systems (us and AI) - for me this is very relevant to alignment work.
I think evolution-of-humans is kinda like taking a model-based RL algorithm (for within-lifetime learning), and doing a massive outer-loop search over neural architectures, hyperparameters, and also reward functions. In principle (though IMO almost definitely not in practice), humans could likewise do that kind of grand outer-loop search over RL algorithms, and get AGI that way. And if they did, I strongly expect that the resulting AGI would have a “curiosity” term in its reward function, as I think humans do. After all, a curiosity reward-function term is already sometimes used in today’s RL, e.g. the Montezuma’s Revenge literature, and it’s not terribly complicated, and it’s useful, and I think innate-curiosity-drive exists not only in humans but also in much much simpler animals. Maybe there’s more than one way to implement curiosity-drive in detail, but something in that category seems pretty essential for an RL algorithm to train successfully in a complex environment, and I don’t think I’m just over-indexing on what’s familiar.
Again, this is all pretty irrelevant on my models because I don’t expect that people will program AGI by doing a blind outer-loop search over RL reward functions. Rather, I expect that people will write down the RL reward function for AGI in the form of handwritten source code, and that they will put curiosity-drive into that reward function source code (as they already sometimes do), because they will find that it’s essential for capabilities.
Separately, insofar as curiosity-drive is essential for capabilities (as I believe), it doesn’t help alignment, but rather hurts alignment, because it’s bad if an AI wants to satisfy its own curiosity at the expense of things that humans directly value. Hopefully that’s obvious to everyone here, right? Parts of the discussion seemed to be portraying AIs-that-are-curious as a good thing rather than a bad thing, which was confusing to me. I assume I was just failing to follow all the unspoken context?
maintaining uncertainty about the true meaning of an objective is important, but there’s a difference between curiosity about the true values one holds, intrinsic curiosity as a component of a value system, and instrumental curiosity as a consequence of an uncertain planning system. I’m surprised to see disagree from MiguelDev and Noosphere, could either of you expand on what you disagree with?
@the gears to ascension Hello! I just think curiosity is a low level attribute that allows a reaction and it maybe good or bad all things considered, with this regard curiosity (or studying curiosity) may help in alignment as well.
For example, an AI is in a situation that it needs to save someone from a burning house, it should be curious enough to consider all possible options available and eventually if it is aligned—it will choose the actions that will result to good outcomes (after also studying all the bad options). That is why I don’t agree with the idea that it purely hurts alignment as described in the comment.
(I think Nate and Ronny shares important knowledge in this dialogue—low level forces (birthed by evolution) that I think is misunderstood by many.)
Your example is about capabilities (assuming the AI is trying to save me from the fire, will it succeed?) but I was talking about alignment (is the AI trying to save me from the fire in the first place?)
I don’t want the AI to say “On the one hand, I care about Steve’s welfare. On the other hand, man I’m just really curious how people behave when they’re on fire. Like, what do they say? What do they do? So, I feel torn—should I save Steve from the fire or not? Hmm…”
(I agree that, if an AI is aligned, and if it is trying to save me from a burning house, then I would rather that the AI be more capable rather than less capable—i.e., I want the AI to come up with and execute very very good plans.)
See also colorful examples in Scott Alexander’s post such as:
As for capabilities, I think curiosity drive is probably essential during early RL training. Once the AI is sufficiently intelligent (including in metacognitive / self-reflective ways), it’s plausible that we could turn curiosity drive off without harming capabilities. After all, it’s possible for an AI to “consider all possible options” not because it’s curious, but rather because it wants me to not die in the fire, and it’s smart enough to know that “considering all possible options” is a very effective means-to-an-end for preventing me from dying in the fire.
Humans can do that too. We don’t only seek information because we’re curious; we can also do it as a means to an end. For example, sometimes I have really wanted to do something, and so then I read an mind-numbingly-boring book that I expect might help me do that thing. Curiosity is not driving me to read the book; on the contrary, curiosity is pushing me away from the book with all its might, because anything else on earth would be more inherently interesting than this boring book. But I read the book anyway, because I really want to do the thing, and I know that reading the book will help. I think an AI which is maximally beneficial to humans would have a similar kind of motivation. Yes it would often brainstorm, and ponder, and explore, and seek information, etc., but it would do all those things not because they are inherently rewarding, but rather because it knows that doing those things is probably useful for what it really wants at the end of the day, which is to benefit humans.
Interesting view but I have to point out that situations change and there will be many tiny details that will become like a back and forth discussion inside the AI’s network as it performs its tasks and turning off curiosity will most likely end up in the worst outcomes as it my not be able to update its decisions (eg. oops didn’t saw there was a fire hose available or oops I didn’t felt the heat of the floor earlier).
Person A: If you’re going to program a chess-playing agent, it needs a direct innate intrinsic drive to not lose its queen.
Person B: Nah, if losing one’s queen is generally bad, it can learn that fact from experience, or from thinking through the likely consequences in any particular case.
Person A: No, that’s not good enough. Protecting the queen is really important. Maybe the AI will learn from experience to not lose its queen in some situations, but situations change and then it will not be motivated to protect its queen sufficiently.
Obviously, Person B is correct here, because AlphaZero-chess works well.
To my ears, your claim (that an AI without intrinsic drive to satisfy curiosity cannot learn to update its decisions) is analogous to Person A’s claim (that an AI without intrinsic drive to protect its queen cannot learn to do so).
In other words, if it’s obvious to you that the AI is insufficiently updating its decisions, it would be obvious to the AI as well (once the AI is sufficiently smart and self-aware). And then the AI can correct for that.
Thanks for explaining your views and this had helped me deconfuse myself, when I was replying and thinking: I am now drawing lines wherein curiosity and self-awareness overlaps also making me feel the expansive nature of studying the theoretical alignment, it’s very dense and it’s so easy to drown in information—this discussion made me feel a whack of a baseball bat then survived to write this comment. Moreover, how to get to Person B still requires knowledge of curiosity and its mechanisms, I still err on the side of finding out how it works[1] or gets imbued to intelligent systems (us and AI) - for me this is very relevant to alignment work.
I’m speculating a simplified evolutionary cognitive chain in humans: curiosity + survival instincts (including hunger) → intelligence → self-awareness → rationality.