I guess I just don’t see it as a weak point in the doom argument
This is kind of baffling to read, particularly in light of the statement by Eliezer that I quoted at the very beginning of my post.
If the argument is (and indeed it is) that “many superficially appealing solutions like corrigibility, moral uncertainty etc are in general contrary to the structure of things that are good at optimization” and the way we see this is by doing homework exercises within an expected utility framework, and the reason why we must choose an EU framework is because “certain structures of cognition are the parts of the agent that are good at stuff and do the work, rather than them being this particular formal thing that they learned for manipulating meaningless numbers as opposed to real-world apples,” because agents which don’t maximize expected utility are always exploitable, it seems quite straightforward that if it isn’t true that these agents are exploitable, then the entire argument collapses.
Of course it doesn’t mean the conclusion is now wrong, but you need some other reason for reaching that conclusion than the typical money pumps and Dutch books that were being offered up as justifications.
goal-orientedness is a convergent attractor in the space of self-modifying intelligences
This also requires a citation, or at the very least some reasoning; I’m not aware of any theorems that show goal-orientedness is a convergent attractor, but I’d be happy to learn more.
If the reason why you think this is true is because of intuitions about what powerful cognition must be like, but the source of those intuitions was the set of coherence arguments that are being discussed in this question post, then learning the coherence arguments do not extend as far as they were purported to should cause you to rethink those intuitions and the conclusions you had previously reached on their basis, as they are now tainted by that confusion.
It feels similar to pondering the familiar claim of evolution, that systems that copy themselves and seize resources are an attractor state. Sure it’s not 100% proven but it seems pretty solid.
Sure, it seems solid, and it also seems plausible that formalizing this should be straightforward for an expert in the domain. I’m not sure why this is a good analogy to the topic of agentic behavior and cognition.
> goal-orientedness is a convergent attractor in the space of self-modifying intelligences
This also requires a citation, or at the very least some reasoning; I’m not aware of any theorems that show goal-orientedness is a convergent attractor, but I’d be happy to learn more.
Ok here’s my reasoning:
When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal. So if we diagram the evolution of the universe’s goal-orientedness, it has the shape of an attractor.
There are plenty of entry paths where some intelligence-improving process spits out a goal-oriented general intelligene (like biological evolution did), but no exit path where a universe whose smartest agent is super goal-oriented ever leads to that no longer being the case.
When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal
Because expected value tells us that the more resources you control, the more robust you are to maximizing your probability of success in the face of what may come at you, and the higher your maximum possible utility is (if you have a utility function without an easy-to-hit max score).
“Maximizing goal-orientedness of the universe” was how I phrased the prediction that conquering resources involves having them aligned to your goal / aligned agents helping you control them.
This is kind of baffling to read, particularly in light of the statement by Eliezer that I quoted at the very beginning of my post.
If the argument is (and indeed it is) that “many superficially appealing solutions like corrigibility, moral uncertainty etc are in general contrary to the structure of things that are good at optimization” and the way we see this is by doing homework exercises within an expected utility framework, and the reason why we must choose an EU framework is because “certain structures of cognition are the parts of the agent that are good at stuff and do the work, rather than them being this particular formal thing that they learned for manipulating meaningless numbers as opposed to real-world apples,” because agents which don’t maximize expected utility are always exploitable, it seems quite straightforward that if it isn’t true that these agents are exploitable, then the entire argument collapses.
Of course it doesn’t mean the conclusion is now wrong, but you need some other reason for reaching that conclusion than the typical money pumps and Dutch books that were being offered up as justifications.
This also requires a citation, or at the very least some reasoning; I’m not aware of any theorems that show goal-orientedness is a convergent attractor, but I’d be happy to learn more.
If the reason why you think this is true is because of intuitions about what powerful cognition must be like, but the source of those intuitions was the set of coherence arguments that are being discussed in this question post, then learning the coherence arguments do not extend as far as they were purported to should cause you to rethink those intuitions and the conclusions you had previously reached on their basis, as they are now tainted by that confusion.
Sure, it seems solid, and it also seems plausible that formalizing this should be straightforward for an expert in the domain. I’m not sure why this is a good analogy to the topic of agentic behavior and cognition.
Ok here’s my reasoning:
When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal. So if we diagram the evolution of the universe’s goal-orientedness, it has the shape of an attractor.
There are plenty of entry paths where some intelligence-improving process spits out a goal-oriented general intelligene (like biological evolution did), but no exit path where a universe whose smartest agent is super goal-oriented ever leads to that no longer being the case.
Because expected value tells us that the more resources you control, the more robust you are to maximizing your probability of success in the face of what may come at you, and the higher your maximum possible utility is (if you have a utility function without an easy-to-hit max score).
“Maximizing goal-orientedness of the universe” was how I phrased the prediction that conquering resources involves having them aligned to your goal / aligned agents helping you control them.