But have you ever, even once in your life, thought anything remotely like “I really like being able to predict the near-future content of my visual field. I should just sit in a dark room to maximize my visual cortex’s predictive accuracy.”?
I think I’ve been in situations where I’ve been disoriented by a bunch of random stuff happening and wished that less of it was happening so that I could get a better handle on stuff. An example I vividly recall was being in a history class in high school and being very bothered by the large number of conversations happening around me.
I think humans optimize for a mix of predictability and surprise. If our experiences are too predictable, we get bored, and if they are too unpredictable, we get overwhelmed. (Autistic people are particularly vulnerable to getting overwhelmed, but even NTs can get overwhelmed by too much stimulus.) In RL research, this is the explore/exploit tradeoff or the multi-armed bandit problem (terrible name). I think this also has something to do with the Free Energy Principle, but that would require understanding Karl Friston and no one understands Karl Friston.
This comment doesn’t really engage much with your post—there’s a lot there and I thought I’d pick one point to get a somewhat substantive disagreement. But I ended up finding this question and thought that I should answer it.
To tie up this thread: I started writing a more substantive response to a section but it took a while and was difficult and I then got invited to dinner, so probably won’t get around to actually writing it.
This may be “overstimulation”, which definitely happens. (A sort-of-analogous BUT PROBABLY NOT MECHANICALLY SIMILAR situation happens each time I check on AI news these days.)
On reflection, the above discussion overclaims a bit in regards to humans. One complication is that the brain uses internal functions of its own activity as inputs to some of its reward functions, and some of those functions may correspond or correlate with something like “visual environment predictability”. Additionally, humans run an online reinforcement learning process, and human credit assignment isn’t perfect. If periods of low visual predictability correlate with negative reward in the near-future, the human may begin to intrinsically dislike being in unpredictable visual environments.
However, I still think that it’s rare for people’s values to assign much weight to their long-run visual predictive accuracy, and I think this is evidence against the hypothesis that a system trained to make lots of correct predictions will thereby intrinsically value making lots of correct predictions.
I think this is evidence against the hypothesis that a system trained to make lots of correct predictions will thereby intrinsically value making lots of correct predictions.
Note that Yudkowsky said
maybe if you train a thing really hard to predict humans, then among the things that it likes are tiny, little pseudo-things that meet the definition of human, but weren’t in its training data, and that are much easier to predict
which isn’t at all the same thing as intrinsically valuing making lots of correct predictions. A better analogy would be the question of whether humans like things that are easier to visually predict. (Except that’s presumably one of many things that went into human RL, so presumably this is a weaker prediction for humans than it is for GPT-n?)
I think I’ve been in situations where I’ve been disoriented by a bunch of random stuff happening and wished that less of it was happening so that I could get a better handle on stuff. An example I vividly recall was being in a history class in high school and being very bothered by the large number of conversations happening around me.
I think humans optimize for a mix of predictability and surprise. If our experiences are too predictable, we get bored, and if they are too unpredictable, we get overwhelmed. (Autistic people are particularly vulnerable to getting overwhelmed, but even NTs can get overwhelmed by too much stimulus.) In RL research, this is the explore/exploit tradeoff or the multi-armed bandit problem (terrible name). I think this also has something to do with the Free Energy Principle, but that would require understanding Karl Friston and no one understands Karl Friston.
This comment doesn’t really engage much with your post—there’s a lot there and I thought I’d pick one point to get a somewhat substantive disagreement. But I ended up finding this question and thought that I should answer it.
To tie up this thread: I started writing a more substantive response to a section but it took a while and was difficult and I then got invited to dinner, so probably won’t get around to actually writing it.
This may be “overstimulation”, which definitely happens. (A sort-of-analogous BUT PROBABLY NOT MECHANICALLY SIMILAR situation happens each time I check on AI news these days.)
I added the following to the relevant section:
On reflection, the above discussion overclaims a bit in regards to humans. One complication is that the brain uses internal functions of its own activity as inputs to some of its reward functions, and some of those functions may correspond or correlate with something like “visual environment predictability”. Additionally, humans run an online reinforcement learning process, and human credit assignment isn’t perfect. If periods of low visual predictability correlate with negative reward in the near-future, the human may begin to intrinsically dislike being in unpredictable visual environments.
However, I still think that it’s rare for people’s values to assign much weight to their long-run visual predictive accuracy, and I think this is evidence against the hypothesis that a system trained to make lots of correct predictions will thereby intrinsically value making lots of correct predictions.
Note that Yudkowsky said
which isn’t at all the same thing as intrinsically valuing making lots of correct predictions. A better analogy would be the question of whether humans like things that are easier to visually predict. (Except that’s presumably one of many things that went into human RL, so presumably this is a weaker prediction for humans than it is for GPT-n?)