There are lots of ways current humans self-improve without much fear and without things going terribly wrong in practice, through medication (e.g. adderall, modafinil), meditation, deliberate practice of rationality techniques, and more.
There are many more kinds of self-improvement that seem safe enough that many humans will be willing and eager to try as the technologies improve.
If I were an upload running on silicon, I would feel pretty comfortable swapping in improved versions of the underlying hardware I was running on (faster processors, more RAM, better network speed, reliability / redundancy, etc.)
I’d be more hesitant about tinkering with the core algorithms underlying my cognition, but I could probably get pretty far with “cyborg”-style enhancements like grafting a calculator or a search engine directly into my brain. After making the improvements that seem very safe, I might be able to make further self-improvements safely, for two reasons: (a) I have gained confidence and knowledge experimenting with small, safe self-improvements, and (b) the cyborg improvements have made me smarter, giving me the ability to prove the safety of more fundamental changes.
Whether we call it wanting to self-improve or or not, I do expect that most human-level AIs will at least consider self-improvement for instrumental convergence reasons. It’s probably true that in the limit of self-improvement, the AI will need to solve many of the same problems that alignment researchers are currently working on, and that might slow down any would-be superintelligence for some hard-to-predict amount of time.
If I were an upload running on silicon, I would feel pretty comfortable swapping in improved versions of the underlying hardware I was running on
Uh oh, the device driver for your new virtual cerebellum is incompatible! You’re just going to sit there experiencing the blue qualia of death until your battery runs out.
This is funny but realistically the human who physically swapped out the device driver for the virtual person would probably just swap the old one back. Generally speaking, digital objects that produce value are backed up carefully and not too fragile. At later stages of self improvement, dumb robots could be used for “screwdriver” tasks like this.
This seems right to me, and the essay could probably benefit from saying something about what counts as self-improvement in the relevant sense. I think the answer is probably something like “improvements that could plausibly lead to unplanned changes in the model’s goals (final or sub).” It’s hard to know exactly what those are. I agree it’s less likely that simply increasing processor speed a bit would do it (though Bostrom argues that big speed increases might). At any rate, it seems to me that whatever the set includes, it will be symmetric as between human-produced and AI-produced improvements to AI. So for the important improvements—the ones risking misalignment—the arguments should remain symmetrical.
There are lots of ways current humans self-improve without much fear and without things going terribly wrong in practice, through medication (e.g. adderall, modafinil), meditation, deliberate practice of rationality techniques, and more.
There are many more kinds of self-improvement that seem safe enough that many humans will be willing and eager to try as the technologies improve.
If I were an upload running on silicon, I would feel pretty comfortable swapping in improved versions of the underlying hardware I was running on (faster processors, more RAM, better network speed, reliability / redundancy, etc.)
I’d be more hesitant about tinkering with the core algorithms underlying my cognition, but I could probably get pretty far with “cyborg”-style enhancements like grafting a calculator or a search engine directly into my brain. After making the improvements that seem very safe, I might be able to make further self-improvements safely, for two reasons: (a) I have gained confidence and knowledge experimenting with small, safe self-improvements, and (b) the cyborg improvements have made me smarter, giving me the ability to prove the safety of more fundamental changes.
Whether we call it wanting to self-improve or or not, I do expect that most human-level AIs will at least consider self-improvement for instrumental convergence reasons. It’s probably true that in the limit of self-improvement, the AI will need to solve many of the same problems that alignment researchers are currently working on, and that might slow down any would-be superintelligence for some hard-to-predict amount of time.
Uh oh, the device driver for your new virtual cerebellum is incompatible! You’re just going to sit there experiencing the blue qualia of death until your battery runs out.
This is funny but realistically the human who physically swapped out the device driver for the virtual person would probably just swap the old one back. Generally speaking, digital objects that produce value are backed up carefully and not too fragile. At later stages of self improvement, dumb robots could be used for “screwdriver” tasks like this.
This seems right to me, and the essay could probably benefit from saying something about what counts as self-improvement in the relevant sense. I think the answer is probably something like “improvements that could plausibly lead to unplanned changes in the model’s goals (final or sub).” It’s hard to know exactly what those are. I agree it’s less likely that simply increasing processor speed a bit would do it (though Bostrom argues that big speed increases might). At any rate, it seems to me that whatever the set includes, it will be symmetric as between human-produced and AI-produced improvements to AI. So for the important improvements—the ones risking misalignment—the arguments should remain symmetrical.