As far as I can see, with O-maximizers Daniel Dewey is doing the exact same thing that I did.
Daniel’s “value learners” goes beyond my own work. I agree that learning an observation utility function has some benefits—though also some problems associated with going back to something more like a R-L agent.
I am not sure that the “O-maximizer” terminology is desirable. There is too much overlap with expected utility maximiser. Also, the stress on observation appears to me to be unwarranted. The utility function is (or should be permitted to be) a function of the agent’s expected external sensory inputs and its own internal state. Using “observation” emphasizes the former—and suggests that there might be some way of maximising the unobserved—which there isn’t.
A few brief words about whether all this is positive or negative—I think being able to manually choose a utility function is pretty good, and it was not quite so obvious that we were going to be able to do that before. I also think the “forecasting first” scenario implied by this kind of architecture is positive (wisdom before action). A more modular intelligence allows a divide-and-conquer strategy—which makes machine intelligence look closer. That has some plus points (possible slower takeoff), but also some negative ones (less time to prepare).
Anyway: cool! It is interesting to see some SingInst folks arriving on practically the exact-same page as me. I’ve been looking at these kinds of architectures for quite a while now—and have a whole web site all about them.
Goodness: an attempt at doing something useful! My first impressions:
The definition of an O-maximizer is pretty central, and presented in a needlessly-confusing way, IMO.
I did something remarkably similar a while back, with my agent cybernetic diagrams and my wirehead analysis.
As far as I can see, with O-maximizers Daniel Dewey is doing the exact same thing that I did.
Daniel’s “value learners” goes beyond my own work. I agree that learning an observation utility function has some benefits—though also some problems associated with going back to something more like a R-L agent.
I am not sure that the “O-maximizer” terminology is desirable. There is too much overlap with expected utility maximiser. Also, the stress on observation appears to me to be unwarranted. The utility function is (or should be permitted to be) a function of the agent’s expected external sensory inputs and its own internal state. Using “observation” emphasizes the former—and suggests that there might be some way of maximising the unobserved—which there isn’t.
A few brief words about whether all this is positive or negative—I think being able to manually choose a utility function is pretty good, and it was not quite so obvious that we were going to be able to do that before. I also think the “forecasting first” scenario implied by this kind of architecture is positive (wisdom before action). A more modular intelligence allows a divide-and-conquer strategy—which makes machine intelligence look closer. That has some plus points (possible slower takeoff), but also some negative ones (less time to prepare).
Anyway: cool! It is interesting to see some SingInst folks arriving on practically the exact-same page as me. I’ve been looking at these kinds of architectures for quite a while now—and have a whole web site all about them.
Thanks :)
And thanks for pointing out your site on architectures—I’ll have to take a look at that.