This is interesting, but it also feels a bit somehow like a “cheat” compared to the more “real” version of this problem (namely, if I know something about the world and can think intelligently about it, how much leverage can I get out of it?).
The kind of system in which you can pack so much information in an action and at the cost of a small bit of information you get so much leverage feels like it ought to be artificial. Trivially, this is actually what makes a lock (real or virtual) work: if you have one simple key/password, you get to do whatever with the contents. But the world as a whole doesn’t seem to work as a locked system (if it did, we would have magic: just a tiny, specific formula or gesture and we get massive results down the line).
I wonder if the key here isn’t in the entropy. Your knowing O here allows you to significantly reduce the entropy of the world as a whole. This feels akin to being a Maxwell demon. In the physical world though there are bounds on that sort of observation and action exactly because to be able to do them would allow you to violate the 2nd principle of thermodynamics. So I wonder if the conjecture may be true under some additional constraints which also include these common properties of macroscopic closed physical systems (while it remains false in artificial subsystems that we can build for the purpose, in which we only care about certain bits and not all the ones defining the underlying physical microstates).
I’m not sure about that; it seems like there’s lots of instances where just a few bits of knowledge gets you lots of optimization power. Knowing Maxwell’s equations lets you do electronics, and knowing which catalyst to use for the Haber process lets you make lots of food and bombs. If I encoded the instructions for making a nanofactory, that would probably be few bits compared to the amount of optimization you could do with that knowledge.
The important thing is that your relevant information isn’t about the state of the world, it’s about the laws. That’s the evolution map f, not the region O (going by the nomenclature I used in my other comment). Your knowledge about O when using the Haber process is actually roughly proportional to the output: you need to know that inside tank X there is such-and-such precursor, and it’s pure to a certain degree. That’s like knowing that a certain region of the bit string is prepared purely with 1s. But the laws are an interesting thing because they can have regularities (in fact, we do know they have them), so that they can be represented in compressed form, and you can exploit that knowledge. But also, to actually represent that knowledge in bits of world-knowledge you’d need to represent the state of all the experiments that were performed and from which that knowledge was inferred and generalized. Though volume wise, that’s still less than the applications… unless you count each application also as a further validation of the model that updates your confidence in it, at which point by definition the bits of knowledge backing the model are always more than the bits of order you got out of it.
Ah, interesting. If I were going down that path, I’d probably aim to use a Landauer-style argument. Something like, “here’s a bound on mutual information between the policy and the whole world, including the agent itself”. And then a lock/password could give us a lot more optimization power over one particular part of the world, but not over the world as a whole.
… I’m not sure how to make something like that nontrivial, though. Problem is, the policy itself would then presumably be embedded in the world, so I(π,world) is just H(π).
Here’s my immediate thought on it: you define a single world bit string W, and A, O and Y are just designated subsections of it. You are able to know only the contents of O, and can set the contents of A (this feels like it’s reducing the entropy of the whole world btw, so you could also postulate that you can only do so by drawing free energy from some other region, your fuel F: for each bit of A you set deterministically, you need to randomize two of F, so that the overall entropy increases). After this, some kind of map W→f(W) is applied repeatedly, evolving the system until such time comes to check that the region Y is indeed as close as possible to your goal configuration G. I think at this point the properties of the result will depend on the properties of the map—is it a “lock” map like your suggested one (compare a region of A with O, and if they’re identical, clone the rest of A into Y, possibly using up F to keep the entropy increase positive?). Is it reversible, is it chaotic?
Yeah, not sure, I need to think about it. Reversibility (even acting as if these were qubits and not simple bits) might be the key here. In general I think there can’t be any hard rule against lock-like maps, because the real world allows building locks. But maybe there’s some rule about how if you define the map itself randomly enough, it probably won’t be a lock-map (for example, you could define a map as a series of operations on two bits writing to a third one op(i,j)→k; decide a region of your world for it, encode bit indices and operators as bit strings, and you can make the map’s program itself a part of the world, and then define what makes a map a lock-like map and how probable that occurrence is).
This is interesting, but it also feels a bit somehow like a “cheat” compared to the more “real” version of this problem (namely, if I know something about the world and can think intelligently about it, how much leverage can I get out of it?).
The kind of system in which you can pack so much information in an action and at the cost of a small bit of information you get so much leverage feels like it ought to be artificial. Trivially, this is actually what makes a lock (real or virtual) work: if you have one simple key/password, you get to do whatever with the contents. But the world as a whole doesn’t seem to work as a locked system (if it did, we would have magic: just a tiny, specific formula or gesture and we get massive results down the line).
I wonder if the key here isn’t in the entropy. Your knowing O here allows you to significantly reduce the entropy of the world as a whole. This feels akin to being a Maxwell demon. In the physical world though there are bounds on that sort of observation and action exactly because to be able to do them would allow you to violate the 2nd principle of thermodynamics. So I wonder if the conjecture may be true under some additional constraints which also include these common properties of macroscopic closed physical systems (while it remains false in artificial subsystems that we can build for the purpose, in which we only care about certain bits and not all the ones defining the underlying physical microstates).
I’m not sure about that; it seems like there’s lots of instances where just a few bits of knowledge gets you lots of optimization power. Knowing Maxwell’s equations lets you do electronics, and knowing which catalyst to use for the Haber process lets you make lots of food and bombs. If I encoded the instructions for making a nanofactory, that would probably be few bits compared to the amount of optimization you could do with that knowledge.
The important thing is that your relevant information isn’t about the state of the world, it’s about the laws. That’s the evolution map f, not the region O (going by the nomenclature I used in my other comment). Your knowledge about O when using the Haber process is actually roughly proportional to the output: you need to know that inside tank X there is such-and-such precursor, and it’s pure to a certain degree. That’s like knowing that a certain region of the bit string is prepared purely with 1s. But the laws are an interesting thing because they can have regularities (in fact, we do know they have them), so that they can be represented in compressed form, and you can exploit that knowledge. But also, to actually represent that knowledge in bits of world-knowledge you’d need to represent the state of all the experiments that were performed and from which that knowledge was inferred and generalized. Though volume wise, that’s still less than the applications… unless you count each application also as a further validation of the model that updates your confidence in it, at which point by definition the bits of knowledge backing the model are always more than the bits of order you got out of it.
Ah, interesting. If I were going down that path, I’d probably aim to use a Landauer-style argument. Something like, “here’s a bound on mutual information between the policy and the whole world, including the agent itself”. And then a lock/password could give us a lot more optimization power over one particular part of the world, but not over the world as a whole.
… I’m not sure how to make something like that nontrivial, though. Problem is, the policy itself would then presumably be embedded in the world, so I(π,world) is just H(π).
Here’s my immediate thought on it: you define a single world bit string W, and A, O and Y are just designated subsections of it. You are able to know only the contents of O, and can set the contents of A (this feels like it’s reducing the entropy of the whole world btw, so you could also postulate that you can only do so by drawing free energy from some other region, your fuel F: for each bit of A you set deterministically, you need to randomize two of F, so that the overall entropy increases). After this, some kind of map W→f(W) is applied repeatedly, evolving the system until such time comes to check that the region Y is indeed as close as possible to your goal configuration G. I think at this point the properties of the result will depend on the properties of the map—is it a “lock” map like your suggested one (compare a region of A with O, and if they’re identical, clone the rest of A into Y, possibly using up F to keep the entropy increase positive?). Is it reversible, is it chaotic?
Yeah, not sure, I need to think about it. Reversibility (even acting as if these were qubits and not simple bits) might be the key here. In general I think there can’t be any hard rule against lock-like maps, because the real world allows building locks. But maybe there’s some rule about how if you define the map itself randomly enough, it probably won’t be a lock-map (for example, you could define a map as a series of operations on two bits writing to a third one op(i,j)→k; decide a region of your world for it, encode bit indices and operators as bit strings, and you can make the map’s program itself a part of the world, and then define what makes a map a lock-like map and how probable that occurrence is).