I think maybe what you’re getting at is that if we try to get a machine learning model to predict its own predictions (i.e. we give it a bunch of data which consists of labels that it made itself), it will do this very easily. Agreed. But that doesn’t imply it’s aware of “itself” as an entity.
No, but it does imply that it has the information about its own prediction process encoded in its weights such that there’s no reason it would have to encode that information twice by also re-encoding it as part of its knowledge of the world as well.
Furthermore, suppose that we take the weights for a particular model, mask some of those weights out, use them as the labels y, and try to predict them using the other weights in that layer as features x. The model will perform terribly on this because it’s not the task that it was trained for. It doesn’t magically have the “self-awareness” necessary to see what’s going on.
Sure, but that’s not actually the relevant task here. It may not understand its own weights, but it does understand its own predictive process, and thus its own output, such that there’s no reason it would encode that information again in its world model.
No, but it does imply that it has the information about its own prediction process encoded in its weights such that there’s no reason it would have to encode that information twice by also re-encoding it as part of its knowledge of the world as well.
OK, it sounds like we agree then? Like, the Predict-O-Matic might have an unusually easy time modeling itself in certain ways, but other than that, it doesn’t get special treatment because it has no special awareness of itself as an entity?
Edit: Trying to provide an intuition pump for what I mean here—in order to avoid duplicating information, I might assume that something which looks like a stapler behaves the same way as other things I’ve seen which looks like staplers—but that doesn’t mean I think all staplers are the same object. It might in some cases be sensible to notice that I keep seeing a stapler lying around and hypothesize that there’s just one stapler which keeps getting moved around the office. But that requires that I perceive the stapler as an entity every time I see it, so entities which were previously separate in my head can be merged. Whereas arguendo, my prediction machinery isn’t necessarily an entity that I recognize; it’s more like the water I’m swimming in in some sense.
No, but it does imply that it has the information about its own prediction process encoded in its weights such that there’s no reason it would have to encode that information twice by also re-encoding it as part of its knowledge of the world as well.
Sure, but that’s not actually the relevant task here. It may not understand its own weights, but it does understand its own predictive process, and thus its own output, such that there’s no reason it would encode that information again in its world model.
OK, it sounds like we agree then? Like, the Predict-O-Matic might have an unusually easy time modeling itself in certain ways, but other than that, it doesn’t get special treatment because it has no special awareness of itself as an entity?
Edit: Trying to provide an intuition pump for what I mean here—in order to avoid duplicating information, I might assume that something which looks like a stapler behaves the same way as other things I’ve seen which looks like staplers—but that doesn’t mean I think all staplers are the same object. It might in some cases be sensible to notice that I keep seeing a stapler lying around and hypothesize that there’s just one stapler which keeps getting moved around the office. But that requires that I perceive the stapler as an entity every time I see it, so entities which were previously separate in my head can be merged. Whereas arguendo, my prediction machinery isn’t necessarily an entity that I recognize; it’s more like the water I’m swimming in in some sense.
I don’t think we do agree, in that I think pressure towards simple models implies that they won’t be dualist in the way that you’re claiming.