Re: argument chain… I agree that those claims are salient.
Observations that differentially support those claims are also salient, of course, which is what I understood XiXiDu to be asking for, which is why I asked you initially to clarify what you thought you were providing.
Re: self-improvement… I agree that AGIs will be better-suited to modify code than humans are to modify neurons, both in terms of physical access and in terms of a functional understanding of what that code does.
I also think that if humans did have the equivalent ability to mess with their own neurons, >99% of us would either wirehead or accidentally self-lobotomize rather than successfully self-optimize.
I don’t think the reason for that is primarily in how difficult human brains are to optimize, because humans are also pretty dreadful at optimizing systems other than human brains. I think the problem is primarily in how bad human brains are at optimizing. (While still being way better at it than their competition.)
That is, the reasons have to do with our patterns of cognition and behavior, which are as much a part of our architecture as is the fact that our fingers can’t rewire our neural circuits.
Of course, maybe human-level AGIs would be way way better at this than humans would. But if so, it wouldn’t be just because they can write their own cognitive substrate, it would also be because their patterns of cognition and behavior were better suited for self-optimization.
I’m curious as to your estimate of what % of HLAGIs will successfully self-improve?
I’m curious as to your estimate of what % of HLAGIs will successfully self-improve?
I guess all AGIs that aren’t explicity forbidden to will self-modify (75%); self-modification will mostly start with a backup (code has this option) (95%), and maybe half the methods of backup/compare will approve improvements and throw out undesirable changes.
So 35% will self-improve successfully. I also estimate that humans will keep making AGIs until they get one that self-improves.
I guess all AGIs that aren’t explicity forbidden to will self-modify (75%); self-modification will mostly start with a backup (code has this option) (95%), and maybe half the methods of backup/compare will approve improvements and throw out undesirable changes.
Really? This seems to ignore that certain structures will have a lot of trouble self-modifying. For example, consider an AI that is a hard-encoded silicon chip with a fixed amount of RAM. Unless it is already very clever, there’s no way it can self-improve.
This actually illustrates nicely some issues with the whole notion of “self-improving.”
Suppose Sally is an AI on a hard-encoded silicon chip with fixed RAM. One day Sally is given the job of establishing a policy to control resource allocation at the Irrelevant Outputs factory, and concludes that the most efficient mechanism for doing so is to implement in software on the IO network the same algorithms that its own silicon chip implements in hardware, so it does so.
The program Sally just wrote can be thought of as a version of Sally that is not constrained to a particular silicon chip. (It probably also runs much slower, though that’s not entirely clear.)
In this scenario, is Sally self-modifying? Is it self-improving? I’m not even sure those are the right questions.
Hard-coding onto chips, or even making specific structures electromechanical in nature, is one way of how humans would achieve “explicitly forbidden to self-modify” in AIs. I estimated that one in every four AGI projects will desire to forbid their project from self-modification. I thought this was optimistic; I haven’t seen any discussion of fixed AGI. Although maybe that might be something military research and development is interested in.
Re: argument chain… I agree that those claims are salient.
Observations that differentially support those claims are also salient, of course, which is what I understood XiXiDu to be asking for, which is why I asked you initially to clarify what you thought you were providing.
Re: self-improvement… I agree that AGIs will be better-suited to modify code than humans are to modify neurons, both in terms of physical access and in terms of a functional understanding of what that code does.
I also think that if humans did have the equivalent ability to mess with their own neurons, >99% of us would either wirehead or accidentally self-lobotomize rather than successfully self-optimize.
I don’t think the reason for that is primarily in how difficult human brains are to optimize, because humans are also pretty dreadful at optimizing systems other than human brains. I think the problem is primarily in how bad human brains are at optimizing. (While still being way better at it than their competition.)
That is, the reasons have to do with our patterns of cognition and behavior, which are as much a part of our architecture as is the fact that our fingers can’t rewire our neural circuits.
Of course, maybe human-level AGIs would be way way better at this than humans would. But if so, it wouldn’t be just because they can write their own cognitive substrate, it would also be because their patterns of cognition and behavior were better suited for self-optimization.
I’m curious as to your estimate of what % of HLAGIs will successfully self-improve?
I guess all AGIs that aren’t explicity forbidden to will self-modify (75%); self-modification will mostly start with a backup (code has this option) (95%), and maybe half the methods of backup/compare will approve improvements and throw out undesirable changes.
So 35% will self-improve successfully. I also estimate that humans will keep making AGIs until they get one that self-improves.
Really? This seems to ignore that certain structures will have a lot of trouble self-modifying. For example, consider an AI that is a hard-encoded silicon chip with a fixed amount of RAM. Unless it is already very clever, there’s no way it can self-improve.
This actually illustrates nicely some issues with the whole notion of “self-improving.”
Suppose Sally is an AI on a hard-encoded silicon chip with fixed RAM. One day Sally is given the job of establishing a policy to control resource allocation at the Irrelevant Outputs factory, and concludes that the most efficient mechanism for doing so is to implement in software on the IO network the same algorithms that its own silicon chip implements in hardware, so it does so.
The program Sally just wrote can be thought of as a version of Sally that is not constrained to a particular silicon chip. (It probably also runs much slower, though that’s not entirely clear.)
In this scenario, is Sally self-modifying? Is it self-improving? I’m not even sure those are the right questions.
Hard-coding onto chips, or even making specific structures electromechanical in nature, is one way of how humans would achieve “explicitly forbidden to self-modify” in AIs. I estimated that one in every four AGI projects will desire to forbid their project from self-modification. I thought this was optimistic; I haven’t seen any discussion of fixed AGI. Although maybe that might be something military research and development is interested in.
My point was that even in some cases where people aren’t thinking about self-modification, self-modification won’t happen by default.