Suppose you have a randomly activated (not dependent on weather) sprinkler system, and also it rains sometimes. These are two independent causes for the sidewalk being wet, each of which are capable of getting the job done all on their own. Suppose you notice that the sidewalk is wet, so it definitely either rained, sprinkled, or both. If I told you it had rained last night, your probability that the sprinklers went on (given that it is wet) should go down, since they already explain the wet sidewalk. If I told you instead that the sprinklers went on last night, then your probability of it having rained (given that it is wet) goes down for a similar reason. This is what “explaining away” is in causal inference. The probability of a cause given its effect goes down when an alternative cause is present.
In the post, the supposedly independent causes are “hardcoded ant-in-mouth aversion” and “value of eating escamoles”, and the effect is negative reward. Realizing that you have a hardcoded ant-in-mouth aversion is like learning that the sprinklers were on last night. The sprinklers being on (incompletely) “explain away” the rain as a cause for the sidewalk being wet. The hardcoded ant-in-mouth aversion explains away the-amount-you-value-escamoles as a cause for the low reward.
I’m not totally sure if that answers your question, maybe you were asking “why model my values as a cause of the negative reward, separate from the hardcoded response itself”? And if so, I think I’d rephrase the heart of the question as, “what do the values in this reward model actually correspond to out in the world, if anything? What are the ‘real values’ which reward is treated as evidence of?” (We’ve done some thinking about that and might put out a post on that soon.)
Okay, I think one crystallization here for me is that “explaining away” is a matter of degree. (I think I found the second half of the comment less helpful, but the combo of the first half + John’s response is helpful both for my own updating, and seeing where you guys are currently at)
Suppose you have a randomly activated (not dependent on weather) sprinkler system, and also it rains sometimes. These are two independent causes for the sidewalk being wet, each of which are capable of getting the job done all on their own. Suppose you notice that the sidewalk is wet, so it definitely either rained, sprinkled, or both. If I told you it had rained last night, your probability that the sprinklers went on (given that it is wet) should go down, since they already explain the wet sidewalk. If I told you instead that the sprinklers went on last night, then your probability of it having rained (given that it is wet) goes down for a similar reason. This is what “explaining away” is in causal inference. The probability of a cause given its effect goes down when an alternative cause is present.
In the post, the supposedly independent causes are “hardcoded ant-in-mouth aversion” and “value of eating escamoles”, and the effect is negative reward. Realizing that you have a hardcoded ant-in-mouth aversion is like learning that the sprinklers were on last night. The sprinklers being on (incompletely) “explain away” the rain as a cause for the sidewalk being wet. The hardcoded ant-in-mouth aversion explains away the-amount-you-value-escamoles as a cause for the low reward.
I’m not totally sure if that answers your question, maybe you were asking “why model my values as a cause of the negative reward, separate from the hardcoded response itself”? And if so, I think I’d rephrase the heart of the question as, “what do the values in this reward model actually correspond to out in the world, if anything? What are the ‘real values’ which reward is treated as evidence of?” (We’ve done some thinking about that and might put out a post on that soon.)
Okay, I think one crystallization here for me is that “explaining away” is a matter of degree. (I think I found the second half of the comment less helpful, but the combo of the first half + John’s response is helpful both for my own updating, and seeing where you guys are currently at)