Suppose you have an agent with k bits of knowledge, that is given n bits of information. You can imagine it’s an agent shown a digitized picture. The agent will infer u bits of useful information from those n bits. u is, critically, to be measured in an agent-independent way. u is the number of words the agent will need to use if the agent is going to write a book about those n bits for a general audience.
What can be said about the function u(k, n), relating the number of useful bits extracted to both the number of bits presented, and the number of bits of background knowledge?
The reason for asking this question is to be able to describe the learning rate of an agent whose learning improves its ability to learn. u is to be described in an agent-independent way because we want to know how much the agent is learning in an inter-agent currency of memory, not (for certain purposes, anyway) in terms of the actual number of bits that the agent needs to keep growing its hardware by.
It’s tricky because it’s-context dependent. Consider a thermostat given a picture showing a heat map of the room. The heat map has p pixels each with b bits. The thermostat has a “mind” capable of representing only the concept “hot/cold”.
You could say that it can extract from this picture p bits of information, and store a concept of hot/cold for each of p locations in the room. But it doesn’t have the concept of location, so I won’t let you do that. Then you might say the thermostat can take the heat map as a time sequence, and form p bits of information over time. I will disqualify that on two counts: first, because it isn’t a time sequence, and you are not allowed to divorce the information from its semantics; second, because the thermostat has no concept of time.
In fact, as the thermostat has no memory, it can’t extract more than 1 bit, total, from any amount of information. When k is low, u(k,n) < n.
More surprisingly, when k is high, u(k,n) > n. A human, shown a fuzzy photograph that can be compressed to 20K, can extract more than 20K of information from it. That sounds impossible. The trick is that, as I said, u is measured in an agent-independent way. If a human has k0 bits of knowledge, and is exposed to n bits of information, the human’s new total information k1 =< k0+n. The human can learn at most n bits of information. But the information represented by those n bits might require the other k0 bits to interpret, and take more than n bits to represent by someone lacking those k0 bits of information.
For example, if you flash two lanterns instead of one in the steeple of the Old North Church to tell me that the British are crossing the Charles, you have given me only 1 bit of information. If I happen to know that the British ships are commanded by Admiral Graves, then I know both that the British are crossing the Charles, and that Samuel Graves is coming to Cambridge. To communicate this to someone who knew nothing about Admiral Graves would take more than 1 bit of information.
Suppose you have an agent with k bits of knowledge, that is given n bits of information. You can imagine it’s an agent shown a digitized picture. The agent will infer u bits of useful information from those n bits. u is, critically, to be measured in an agent-independent way. u is the number of words the agent will need to use if the agent is going to write a book about those n bits for a general audience.
What can be said about the function u(k, n), relating the number of useful bits extracted to both the number of bits presented, and the number of bits of background knowledge?
The reason for asking this question is to be able to describe the learning rate of an agent whose learning improves its ability to learn. u is to be described in an agent-independent way because we want to know how much the agent is learning in an inter-agent currency of memory, not (for certain purposes, anyway) in terms of the actual number of bits that the agent needs to keep growing its hardware by.
It’s tricky because it’s-context dependent. Consider a thermostat given a picture showing a heat map of the room. The heat map has p pixels each with b bits. The thermostat has a “mind” capable of representing only the concept “hot/cold”.
You could say that it can extract from this picture p bits of information, and store a concept of hot/cold for each of p locations in the room. But it doesn’t have the concept of location, so I won’t let you do that. Then you might say the thermostat can take the heat map as a time sequence, and form p bits of information over time. I will disqualify that on two counts: first, because it isn’t a time sequence, and you are not allowed to divorce the information from its semantics; second, because the thermostat has no concept of time.
In fact, as the thermostat has no memory, it can’t extract more than 1 bit, total, from any amount of information. When k is low, u(k,n) < n.
More surprisingly, when k is high, u(k,n) > n. A human, shown a fuzzy photograph that can be compressed to 20K, can extract more than 20K of information from it. That sounds impossible. The trick is that, as I said, u is measured in an agent-independent way. If a human has k0 bits of knowledge, and is exposed to n bits of information, the human’s new total information k1 =< k0+n. The human can learn at most n bits of information. But the information represented by those n bits might require the other k0 bits to interpret, and take more than n bits to represent by someone lacking those k0 bits of information.
For example, if you flash two lanterns instead of one in the steeple of the Old North Church to tell me that the British are crossing the Charles, you have given me only 1 bit of information. If I happen to know that the British ships are commanded by Admiral Graves, then I know both that the British are crossing the Charles, and that Samuel Graves is coming to Cambridge. To communicate this to someone who knew nothing about Admiral Graves would take more than 1 bit of information.