I think “self awareness” is sometimes a label for one or more feature that, among other things, serve to catch errors early and repair the faulty thought process.
Interesting. I thought about this for a while just now, and it occurred to me that self-awareness may just be “having a mental model of oneself.” To be able to model oneself, one needs the general ability to make mental models. To do that requires the ability to recognize patterns at all levels of abstraction on what one is experiencing. To explain this, I need to clarify what “level of abstraction” means. I will try to do this by example.
A creature is hunting and he discovers that white rabbits taste good. Later he sees a gray rabbit for the first time. The creature’s neural net tells him that it’s a 98% match with the white rabbit, so probably also tasty. But let’s say gray rabbit turns out to taste bad. The creature has recognized the concrete patterns: 1. White rabbits taste good. 2. Gray rabbits taste bad.
Next week, he tries catching and eating a white bird, and it tastes good. Later he sees a gray bird. To assign any higher probability to the gray bird tasting bad, it seems the creature would have to recognize the abstract pattern: 3. Gray animals taste bad. (Of course it could also just be a negative or bad-tasting association with the color gray, but let’s suppose not—for that possibility could surely be avoided by making the example more complicated.)
Now “animal” is more abstract than “white rabbit” because there’s at least some kind of archetypal white rabbit one can visualize clearly (I’ll assume the creature is conceptualizing in the visual modality for simplicity’s sake).
“Rabbit” (remember that for all the creature knows, this simply means the union of the set “white rabbits” with the set “gray rabbits”) by itself is a tad more abstract, because to visualize it you’d have to see that archetypal rabbit but perhaps with the fur color switching back and forth between gray and white in your mind’s eye.
“Animal” is still more abstract, because to visualize it you’d have to, for instance, see a raccoon, a dog, and a tiger, and something that signals to you something like “etc.” (Naturally, if the creature’s method of conceptualization made visualizing “animal” easier than “rabbit”, “animal” would have the lower level of abstraction for him, and “rabbit” the higher—it all depends on the creature’s modeling methods.)
Now the creature has a mental model. If the model happens to be purely visual, it might look like a Venn diagram: a big circle labeled “animals”, two smaller patches within that circle that overlap with the “white things” circle and the “gray things” circle, and another outside region labeled “bad-tasting things” that sweeps in to encircle “gray animals” but not “white animals.”
The creature might revise that model after it tries eating the gray bird, but for now it’s the prediction model he’s using to determine how much energy to expend on hunting the gray bird in his sights. The model has revisable parts and predictive power, so I would call it a serviceable model—whether or not it’s accurate at this point.
Since the creature can make mental models like this, making a mental model of himself seems within his grasp. Then we could call the creature “self-aware.” The way it would trace back the thought process that led to a bad idea would be to recognize that the mental model has a flaw—i.e., a failed prediction—and make the necessary changes.
For instance, right now the creature’s mental model predicts that gray animals taste bad. If he eats several gray birds and finds them all to taste at least as good as white birds, he can see how the data point “delicious gray bird” conflicts with the fact that “gray animals” (and hence “gray birds”) is fully encircled by “bad-tasting things” in the Venn diagram in his mind’s eye.
To know how to self-modify most effectively in this case, perhaps the creature has another mental model, built up from past experience and probably at an even higher level of abstraction, that predicts the most effective course of action in such cases (cases where new data conflicts with the present model of something) is to pull the circle back so that it no longer covers the category that the exceptional data point belonged to. In this case, the creature pulls the circle “bad tasting things” (now perhaps shaped more like an amoeba) back slightly so that it no longer covers “gray birds,” and now the model is more accurate. So it seems that being able to make mental models of mental models is crucial to optimization or management of failure (and perhaps also sufficient for the task!).
So again, once the creature turns this mental modeling ability (based on pattern recognition and, in this case, visual imaging) to his own self, he becomes effectively self-aware. This doesn’t seem essential for optimization, but I concede I can’t think of a way to avoid this happening once the ability to form mental models is in place.
This somewhat conflicts with how I’ve used the term in previous posts, but I think this new conception is a more useful definition.
(To taboo “motivation” I’ll give two definitions: Tendency toward certain actions based on 1. the desire to gain pleasure or avoid pain, or 2. any utility function, including goals programmed in by humans in advance. In terms of AI safety, there doesn’t seem to be significant differences between 1 and 2. [This means I’ve changed my position upon reflection in this post.])
Interesting. I thought about this for a while just now, and it occurred to me that self-awareness may just be “having a mental model of oneself.” To be able to model oneself, one needs the general ability to make mental models. To do that requires the ability to recognize patterns at all levels of abstraction on what one is experiencing. To explain this, I need to clarify what “level of abstraction” means. I will try to do this by example.
A creature is hunting and he discovers that white rabbits taste good. Later he sees a gray rabbit for the first time. The creature’s neural net tells him that it’s a 98% match with the white rabbit, so probably also tasty. But let’s say gray rabbit turns out to taste bad. The creature has recognized the concrete patterns: 1. White rabbits taste good. 2. Gray rabbits taste bad.
Next week, he tries catching and eating a white bird, and it tastes good. Later he sees a gray bird. To assign any higher probability to the gray bird tasting bad, it seems the creature would have to recognize the abstract pattern: 3. Gray animals taste bad. (Of course it could also just be a negative or bad-tasting association with the color gray, but let’s suppose not—for that possibility could surely be avoided by making the example more complicated.)
Now “animal” is more abstract than “white rabbit” because there’s at least some kind of archetypal white rabbit one can visualize clearly (I’ll assume the creature is conceptualizing in the visual modality for simplicity’s sake).
“Rabbit” (remember that for all the creature knows, this simply means the union of the set “white rabbits” with the set “gray rabbits”) by itself is a tad more abstract, because to visualize it you’d have to see that archetypal rabbit but perhaps with the fur color switching back and forth between gray and white in your mind’s eye.
“Animal” is still more abstract, because to visualize it you’d have to, for instance, see a raccoon, a dog, and a tiger, and something that signals to you something like “etc.” (Naturally, if the creature’s method of conceptualization made visualizing “animal” easier than “rabbit”, “animal” would have the lower level of abstraction for him, and “rabbit” the higher—it all depends on the creature’s modeling methods.)
Now the creature has a mental model. If the model happens to be purely visual, it might look like a Venn diagram: a big circle labeled “animals”, two smaller patches within that circle that overlap with the “white things” circle and the “gray things” circle, and another outside region labeled “bad-tasting things” that sweeps in to encircle “gray animals” but not “white animals.”
The creature might revise that model after it tries eating the gray bird, but for now it’s the prediction model he’s using to determine how much energy to expend on hunting the gray bird in his sights. The model has revisable parts and predictive power, so I would call it a serviceable model—whether or not it’s accurate at this point.
Since the creature can make mental models like this, making a mental model of himself seems within his grasp. Then we could call the creature “self-aware.” The way it would trace back the thought process that led to a bad idea would be to recognize that the mental model has a flaw—i.e., a failed prediction—and make the necessary changes.
For instance, right now the creature’s mental model predicts that gray animals taste bad. If he eats several gray birds and finds them all to taste at least as good as white birds, he can see how the data point “delicious gray bird” conflicts with the fact that “gray animals” (and hence “gray birds”) is fully encircled by “bad-tasting things” in the Venn diagram in his mind’s eye.
To know how to self-modify most effectively in this case, perhaps the creature has another mental model, built up from past experience and probably at an even higher level of abstraction, that predicts the most effective course of action in such cases (cases where new data conflicts with the present model of something) is to pull the circle back so that it no longer covers the category that the exceptional data point belonged to. In this case, the creature pulls the circle “bad tasting things” (now perhaps shaped more like an amoeba) back slightly so that it no longer covers “gray birds,” and now the model is more accurate. So it seems that being able to make mental models of mental models is crucial to optimization or management of failure (and perhaps also sufficient for the task!).
So again, once the creature turns this mental modeling ability (based on pattern recognition and, in this case, visual imaging) to his own self, he becomes effectively self-aware. This doesn’t seem essential for optimization, but I concede I can’t think of a way to avoid this happening once the ability to form mental models is in place.
This somewhat conflicts with how I’ve used the term in previous posts, but I think this new conception is a more useful definition.
(To taboo “motivation” I’ll give two definitions: Tendency toward certain actions based on 1. the desire to gain pleasure or avoid pain, or 2. any utility function, including goals programmed in by humans in advance. In terms of AI safety, there doesn’t seem to be significant differences between 1 and 2. [This means I’ve changed my position upon reflection in this post.])
[EDIT: typos]