but also thinks that intelligence/compression is fundamentally tied into things like beauty and humor in a way that might make the future less bleak & valueless than SingInst folk tend to picture it.
Schmidhuber’s aesthetics paper, going on memory, defines beauty/humor as produced by an optimization process which is maximizing the first derivative of compression rates. That is, agents do not seek the most compressible inputs nor incompressible streams of observations, but rather the streams for which their compression rate is increasing the fastest.
This is a very useful heuristic which is built into us because it automatically accounts for diminishing marginal returns: after a certain point, additional compression becomes hard or pointless, and so the agent will switch to the next stream on which progress can be made.
But, IIRC, this is provably not optimal for utility-maximization because it makes no account of the utility of the various streams: you may be able to make plenty of progress in your compression of Methods of Rationality even when you should be working on your programming or biology or something useful despite their painfully slow rates of progress. (‘Amusing ourselves to death’ comes to mind. If this was meant for ancestral environments, then modern art/fiction/etc. is simply an indirect wireheading: we think we are making progress in decoding our environment and increasing our reproductive fitness, when all we’re doing is decoding simple micro-environments meant to be decoded.)
I’m not even sure this heuristic is optimal from the point of view of universal prediction/compression/learning, but I’d have to re-read the paper to remember why I had that intuition. (For starters, if it was optimal, it should be derivable from AIXI or Godel machines or something, but he has to spend much of the paper appealing to more empirical evidence and examples.)
So, given that it’s optimal in neither sense, future intelligences may preserve it—sure, why not? especially if it’s designed in—but there’s no reason to expect it to generically emerge across any significant subset of possible intelligences. Why follow a heuristic as simplistic as ‘maximize rate of compression progress’ when you can instead do some basic calculations about which streams will be more valuable to compress or likely cheap to figure out?
Check out Moshe’s expounding of Steve’s objection so Schmidhuber’s main point, which I think makes the same argument that you do. (One could easily counter that such a wireheading AI would never get off the ground, but I think that debate can be cordoned off.)
ETA: Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator… but AFAIK Schmidhuber hasn’t gone that route.
moshez’s first argument sounds like it’s the same thing as my point about it not being optimal for a utility-maximizer, in considerably different terms.
His second hyperbolic argument seems to me to be wrong or irrelevant: I would argue that people are in practice extremely capable of engaging in hyperbolic discounting with regard to the best and most absorbing artworks while over-consuming ‘junk food’ art (and this actually forms part of my essay arguing that new art should not be subsidized).
Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator...
I don’t really follow. Is this Omega as in the predictor, or Omega as in Chaitin’s Omega? The latter doesn’t allow any compressor any progress beyond the first few bits due to resource constraints, and if bits of Chaitin’s Omega are doled out, they will have to be at least as cheap to crack as brute-force running the equivalent Turing machine or else the agent will prefer the brute-forcing and ignore the Omega-bait. So the agent will do no worse than before and possibly better (eg. if the bits are offered as-is with no tricky traps or proof of work-style schemes).
His second hyperbolic argument seems to me to be wrong or irrelevant
Agreed. (I like your essay about junk food art. By the way, did you ever actually do the utilitarian calculations re Nazi Germany’s health policies? Might you share the results?)
I don’t really follow.
Me neither, I just intuit that there might be interesting non-obvious arguments in roughly that argumentspace.
Omega as in the predictor, or Omega as in Chaitin’s Omega?
I like to think of the former as the physical manifestation of the latter, and I like to think of both of them as representations of God. But anyway, the latter.
beyond the first few bits due to resource constraints
You mean because it’s hard to find/verify bits of omega? But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega. I’m not sure what that would imply or if it’s relevant… maybe I should look at this again after the next time I re-familiarize myself with the generalized Turing machine literature.
By the way, did you ever actually do the utilitarian calculations re Nazi Germany’s health policies? Might you share the results?
I was going off a library copy, and thought of it only afterwards; I keep hoping someone else will do it for me.
But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega.
His jargon is a little much for me. I agree one can approximate Omega by enumerating digits, but what is ‘very easily’ here?
Schmidhuber’s aesthetics paper, going on memory, defines beauty/humor as produced by an optimization process which is maximizing the first derivative of compression rates. That is, agents do not seek the most compressible inputs nor incompressible streams of observations, but rather the streams for which their compression rate is increasing the fastest.
This is a very useful heuristic which is built into us because it automatically accounts for diminishing marginal returns: after a certain point, additional compression becomes hard or pointless, and so the agent will switch to the next stream on which progress can be made.
But, IIRC, this is provably not optimal for utility-maximization because it makes no account of the utility of the various streams: you may be able to make plenty of progress in your compression of Methods of Rationality even when you should be working on your programming or biology or something useful despite their painfully slow rates of progress. (‘Amusing ourselves to death’ comes to mind. If this was meant for ancestral environments, then modern art/fiction/etc. is simply an indirect wireheading: we think we are making progress in decoding our environment and increasing our reproductive fitness, when all we’re doing is decoding simple micro-environments meant to be decoded.)
I’m not even sure this heuristic is optimal from the point of view of universal prediction/compression/learning, but I’d have to re-read the paper to remember why I had that intuition. (For starters, if it was optimal, it should be derivable from AIXI or Godel machines or something, but he has to spend much of the paper appealing to more empirical evidence and examples.)
So, given that it’s optimal in neither sense, future intelligences may preserve it—sure, why not? especially if it’s designed in—but there’s no reason to expect it to generically emerge across any significant subset of possible intelligences. Why follow a heuristic as simplistic as ‘maximize rate of compression progress’ when you can instead do some basic calculations about which streams will be more valuable to compress or likely cheap to figure out?
Check out Moshe’s expounding of Steve’s objection so Schmidhuber’s main point, which I think makes the same argument that you do. (One could easily counter that such a wireheading AI would never get off the ground, but I think that debate can be cordoned off.)
ETA: Maybe a counterargument could be made involving omega or super-omega promising more compression than any artificial pseudo-random generator… but AFAIK Schmidhuber hasn’t gone that route.
moshez’s first argument sounds like it’s the same thing as my point about it not being optimal for a utility-maximizer, in considerably different terms.
His second hyperbolic argument seems to me to be wrong or irrelevant: I would argue that people are in practice extremely capable of engaging in hyperbolic discounting with regard to the best and most absorbing artworks while over-consuming ‘junk food’ art (and this actually forms part of my essay arguing that new art should not be subsidized).
I don’t really follow. Is this Omega as in the predictor, or Omega as in Chaitin’s Omega? The latter doesn’t allow any compressor any progress beyond the first few bits due to resource constraints, and if bits of Chaitin’s Omega are doled out, they will have to be at least as cheap to crack as brute-force running the equivalent Turing machine or else the agent will prefer the brute-forcing and ignore the Omega-bait. So the agent will do no worse than before and possibly better (eg. if the bits are offered as-is with no tricky traps or proof of work-style schemes).
Agreed. (I like your essay about junk food art. By the way, did you ever actually do the utilitarian calculations re Nazi Germany’s health policies? Might you share the results?)
Me neither, I just intuit that there might be interesting non-obvious arguments in roughly that argumentspace.
I like to think of the former as the physical manifestation of the latter, and I like to think of both of them as representations of God. But anyway, the latter.
You mean because it’s hard to find/verify bits of omega? But Schmidhuber argues that certain generalized computers can enumerate bits of omega very easily, which is why he developed the idea of a super-omega. I’m not sure what that would imply or if it’s relevant… maybe I should look at this again after the next time I re-familiarize myself with the generalized Turing machine literature.
I was going off a library copy, and thought of it only afterwards; I keep hoping someone else will do it for me.
His jargon is a little much for me. I agree one can approximate Omega by enumerating digits, but what is ‘very easily’ here?