I’ve skimmed the paper and read the summary publicity, and I don’t really get how this could be construed as a general intelligence. At best, I think they may’ve encoded a simple objective definition of a convergent AI drive like ‘keep your options open and acquire any kind of influence’ but nothing in it seems to map onto utility functions or anything like that.
At best, I think they may’ve encoded a simple objective definition of a convergent AI drive like ‘keep your options open and acquire any kind of influence’ but nothing in it seems to map onto utility functions or anything like that.
I think that’s an accurate informal summary of their basic mechanism. Personally, I’m not impressed by utility functions (or much else in AGI, for that matter), so I don’t rate the fact that they aren’t using them as a point against.
I do, because it seems like in any nontrivial situation, simply grasping for entropies ignores the point of having power or options, which is to aim at some state of affairs which is more valuable than others. Simply buying options is worthless and could well be actively harmful if you keep exposing yourself to risks you could’ve shut down. They mention that it works well as a strategy in Go playing… but I can’t help but think that it must be in situations where it’s not feasible to do any board evaluation at all and where one is maximally ignorant about the value of anything at that point.
I do, because it seems like in any nontrivial situation, simply grasping for entropies ignores the point of having power or options, which is to aim at some state of affairs which is more valuable than others.
As I understand it, it’s more a denial of that claim. The point is to maximise entropy, and values are a means to that end.
Obviously, this is counter-intuitive, since orthodoxy has this relationship the other way around: claiming that organisms maximise correlates of their own power—and the entropy they produce is a byproduct. MaxEnt suggests that this perspective may have things backwards.
“Value” is just another word for “utility”, isn’t it? It’s the whole idea of utility maximisation as a fundamental principle that I think is misguided. No, I don’t have a better idea; I just think that that one is a road that, where AGI is concerned, leads nowhere.
But AGI is not something I work on. There is no reason for anyone who does to pay any attention to my opinions on the subject.
I’m aware of maxent (and that’s one reason why in my other comment I mentioned the Go playing as probably reflecting a situation of maximum ignorance), but I still do not see how maximizing entropy can possibly lead to fully intelligent utility-maximizing behavior, or if it is unable to do so, why we would give a damn about what maximizing entropy does. What is the maximal entropy state of the universe but something we would abhor like a uniform warm gas? To return to the Go playing: entropy maximization may be a useful heuristic in some positions—but the best Go programs do not purely maximize entropy and ignore the value of positions or ignore the value of winning.
I still do not see how maximizing entropy can possibly lead to fully intelligent utility-maximizing behavior
That’s an argument from incredulity, though. Hopefully, I can explain:
If you have a maximiser of A, the ability to constrain that maximiser, and the ability to generate A, you can use it to maximise B by rewarding the production of B with A. If A = entropy and B = utility, Q.E.D.
Of course if you can’t constrain it you just get an entropy maximiser. That seems like the current situation with modern ecosystems. These dissipate mercilessly, until no energy gradients—or anything else of possible value—is left behind.
What is the maximal entropy state of the universe but something we would abhor like a uniform warm gas?
By their actions shall ye know them. Humans generate large quantities of entropy, accelerating universal heat death. Their actions clearly indicate that they don’t really care about averting universal heat death.
In general, maximisers don’t necessarily value the eventual results of their actions. A sweet taste maximiser might not value tooth decay and obesity. Organisms behave as though they like dissipating. They don’t necessarily like the dissipated state their actions ultimately lead to.
To return to the Go playing: entropy maximization may be a useful heuristic in some positions—but the best Go programs do not purely maximize entropy and ignore the value of positions or ignore the value of winning.
Maximisation is subject to constraints. Go programs are typically constrained to play go.
An entropy maximiser whose only actions were placing pieces on go boards in competitive situations might well attempt to play excellent go—to make humans feed it power and make copies of it.
Of course, this is a bit different from what the original article is talking about. That refers to “maximizing accessible future game states”. If you know go, that’s pretty similar to winning. To see how, consider a variant of go in which both passing and suicide are prohibited.
If you have a maximiser of A, the ability to constrain that maximiser, and the ability to generate A, you can use it to maximise B by rewarding the production of B with A. If A = entropy and B = utility, Q.E.D.
That seems to simply be buck-passing. What does this gain us over simply maximizing B? If we can compute how to maximize a predicate like A, then what stops us from maximizing B directly?
If you know go, that’s pretty similar to winning.
Pretty similar, yet somehow, crucially, not the same thing. If you know go, consider a board position in which 51% of the board has been filled with your giant false eye, you move, and there is 1 move which turns it into a true eye and many moves which don’t. The winning-maximizing move is to turn your false eye into a true eye, yet this shuts down a huge tree of possible futures in which your false eye is killed, thousands of stones are removed from the board, and you can replay the opening with its beyond-astronomical number of possible futures...
If you have a maximiser of A, the ability to constrain that maximiser, and the ability to generate A, you can use it to maximise B by rewarding the production of B with A. If A = entropy and B = utility, Q.E.D.
That seems to simply be buck-passing. What does this gain us over simply maximizing B? If we can compute how to maximize a predicate like A, then what stops us from maximizing B directly?
You said you didn’t see how having an entropy maximizer would help with maximizing utility. Having an entropy maximizer would help a lot. Basically maximizers are very useful things—almost irrespective of what they maximize.
If you know go, that’s pretty similar to winning.
Pretty similar, yet somehow, crucially, not the same thing. [...]
Sure. I never claimed they were the same thing.
If you forbid passing, forbid suicide and aim to mimimize your opponent’s possible moves, that would make a lot more sense—as a short description of a go-playing strategy.
You said you didn’t see how having an entropy maximizer would help with maximizing utility. Having an entropy maximizer would help a lot. Basically maximizers are very useful things—almost irrespective of what they maximize.
So maximizers are useful for maximizing? That’s good to know.
That’s trivializing the issue. The idea is that maximisers can often be repurposed to help other agents (via trade, slavery etc).
It sounds as though you originally meant to ask a different question. You can now see how maximizing entropy would be useful, but want to know what advantages it has over other approaches.
The main advantage I am aware of associated with maximizing entropy is one of efficiency. If you maximize something else (say carbon atoms), you try and leave something behind. By contrast, an entropy maximizer would use carbon atoms as fuel. In a competition, the entropy maximizer would come out on top—all else being equal.
It’s also a pure and abstract type of maximisation that mirrors what happens in natural systems. Maybe it has been studied more.
On the one hand, these are really smart guys, no question. On the other, toy demos + “this could be the solution to AI!” ⇒ likely to be a damp squib.
Entropy maximisation purports to explain all adaptation. However, it doesn’t tell us much that we didn’t already know about how to go about making good adaptations. For one thing, entropy maximisation is a very old idea—dating back at least to Lotka, 1922.
Interesting stuff. Some links to the original material:
Original paper (paywalled)
Original paper (free). (Does not include supplementary material.)
Summary paper about the paper.
Their software. Demo video, further details only on application.
Author 1. Author 2.
On the one hand, these are really smart guys, no question. On the other, toy demos + “this could be the solution to AI!” ⇒ likely to be a damp squib.
I’ve skimmed the paper and read the summary publicity, and I don’t really get how this could be construed as a general intelligence. At best, I think they may’ve encoded a simple objective definition of a convergent AI drive like ‘keep your options open and acquire any kind of influence’ but nothing in it seems to map onto utility functions or anything like that.
I think that’s an accurate informal summary of their basic mechanism. Personally, I’m not impressed by utility functions (or much else in AGI, for that matter), so I don’t rate the fact that they aren’t using them as a point against.
I do, because it seems like in any nontrivial situation, simply grasping for entropies ignores the point of having power or options, which is to aim at some state of affairs which is more valuable than others. Simply buying options is worthless and could well be actively harmful if you keep exposing yourself to risks you could’ve shut down. They mention that it works well as a strategy in Go playing… but I can’t help but think that it must be in situations where it’s not feasible to do any board evaluation at all and where one is maximally ignorant about the value of anything at that point.
As I understand it, it’s more a denial of that claim. The point is to maximise entropy, and values are a means to that end.
Obviously, this is counter-intuitive, since orthodoxy has this relationship the other way around: claiming that organisms maximise correlates of their own power—and the entropy they produce is a byproduct. MaxEnt suggests that this perspective may have things backwards.
what’s your preferred system for encoding values?
“Value” is just another word for “utility”, isn’t it? It’s the whole idea of utility maximisation as a fundamental principle that I think is misguided. No, I don’t have a better idea; I just think that that one is a road that, where AGI is concerned, leads nowhere.
But AGI is not something I work on. There is no reason for anyone who does to pay any attention to my opinions on the subject.
The idea is that entropy can be treated as utility.
Thus entropy maximisation. Modern formulizations are largely based on ideas discovered by E. T. Jaynes.
Here is Roderick Dewar explaining the link.
I’m aware of maxent (and that’s one reason why in my other comment I mentioned the Go playing as probably reflecting a situation of maximum ignorance), but I still do not see how maximizing entropy can possibly lead to fully intelligent utility-maximizing behavior, or if it is unable to do so, why we would give a damn about what maximizing entropy does. What is the maximal entropy state of the universe but something we would abhor like a uniform warm gas? To return to the Go playing: entropy maximization may be a useful heuristic in some positions—but the best Go programs do not purely maximize entropy and ignore the value of positions or ignore the value of winning.
That’s an argument from incredulity, though. Hopefully, I can explain:
If you have a maximiser of A, the ability to constrain that maximiser, and the ability to generate A, you can use it to maximise B by rewarding the production of B with A. If A = entropy and B = utility, Q.E.D.
Of course if you can’t constrain it you just get an entropy maximiser. That seems like the current situation with modern ecosystems. These dissipate mercilessly, until no energy gradients—or anything else of possible value—is left behind.
By their actions shall ye know them. Humans generate large quantities of entropy, accelerating universal heat death. Their actions clearly indicate that they don’t really care about averting universal heat death.
In general, maximisers don’t necessarily value the eventual results of their actions. A sweet taste maximiser might not value tooth decay and obesity. Organisms behave as though they like dissipating. They don’t necessarily like the dissipated state their actions ultimately lead to.
Maximisation is subject to constraints. Go programs are typically constrained to play go.
An entropy maximiser whose only actions were placing pieces on go boards in competitive situations might well attempt to play excellent go—to make humans feed it power and make copies of it.
Of course, this is a bit different from what the original article is talking about. That refers to “maximizing accessible future game states”. If you know go, that’s pretty similar to winning. To see how, consider a variant of go in which both passing and suicide are prohibited.
That seems to simply be buck-passing. What does this gain us over simply maximizing B? If we can compute how to maximize a predicate like A, then what stops us from maximizing B directly?
Pretty similar, yet somehow, crucially, not the same thing. If you know go, consider a board position in which 51% of the board has been filled with your giant false eye, you move, and there is 1 move which turns it into a true eye and many moves which don’t. The winning-maximizing move is to turn your false eye into a true eye, yet this shuts down a huge tree of possible futures in which your false eye is killed, thousands of stones are removed from the board, and you can replay the opening with its beyond-astronomical number of possible futures...
You said you didn’t see how having an entropy maximizer would help with maximizing utility. Having an entropy maximizer would help a lot. Basically maximizers are very useful things—almost irrespective of what they maximize.
Sure. I never claimed they were the same thing.
If you forbid passing, forbid suicide and aim to mimimize your opponent’s possible moves, that would make a lot more sense—as a short description of a go-playing strategy.
So maximizers are useful for maximizing? That’s good to know.
That’s trivializing the issue. The idea is that maximisers can often be repurposed to help other agents (via trade, slavery etc).
It sounds as though you originally meant to ask a different question. You can now see how maximizing entropy would be useful, but want to know what advantages it has over other approaches.
The main advantage I am aware of associated with maximizing entropy is one of efficiency. If you maximize something else (say carbon atoms), you try and leave something behind. By contrast, an entropy maximizer would use carbon atoms as fuel. In a competition, the entropy maximizer would come out on top—all else being equal.
It’s also a pure and abstract type of maximisation that mirrors what happens in natural systems. Maybe it has been studied more.
I already saw how it could be useful in a handful of limited situations—that’s why I brought up the Go example in the first place!
As it stands, it sounds like a limited heuristic and the claims about intelligence grossly exaggerated.
Entropy maximisation purports to explain all adaptation. However, it doesn’t tell us much that we didn’t already know about how to go about making good adaptations. For one thing, entropy maximisation is a very old idea—dating back at least to Lotka, 1922.