The AI problem is easier in some ways (and significantly harder in others) because we’re not taking an existing system and trying to align it. We want to design the system (and/or systems that produce that system, aka optimization) to be aligned in the first place. This can be done through formal work to provide guarantees, lots of code, and lots of testing.
However, doing that for some arbitrary agent or even just a human isn’t really a focus of most alignment research. A human has the issue that they’re already misaligned (in a sense), and there are many various technological/ethical/social issues with either retraining them or performing the modifications to get them aligned. If the ideas that people had for alignment were about ‘converting’ a misaligned intelligence to an aligned one, then humans could maybe be a test-case, but that isn’t really the focus. We also are only ‘slowly’ advancing our ability to understand the body and how the brain works. While we have some of the same issues with neural networks, it is a lot cheaper, less unethical, we can rerun it (for non-dangerous networks), etcetera.
Though, there has been talk of things like incentives, moral mazes, inadequate equilibria and more which are somewhat related to the alignment/misalignment of humans and where they can do better.
Thank you for clarifying! This highlights an assumption about AI so fundamental that I wasn’t previously fully aware that I had it. As you say, there’s a big difference between what to do if we discover AI, vs if we create it. While I think that we as a species are likely to create something that meets our definition of strong AI sooner or later, I consider it vanishingly unlikely that any specific individual or group who goes out trying to create it will actually succeed. So for most of us, especially myself, I figure that on an individual level it’ll be much more like discovering an AI that somebody else created (possibly by accident) than actually creating the thing.
It’s intuitively obvious why alignment work on creating AI doesn’t apply to extant systems. But if the best that the people who care most about it can do is work on created AI without yet applying any breakthroughs to the prospect of a discovered AI (where we can’t count on knowing how it works, ethically create and then destroy a bunch of instances of it, etc)… I think I am beginning to see where we get the meme of how one begins to think hard about these topics and shortly afterward spends a while being extremely frightened.
The AI problem is easier in some ways (and significantly harder in others) because we’re not taking an existing system and trying to align it. We want to design the system (and/or systems that produce that system, aka optimization) to be aligned in the first place. This can be done through formal work to provide guarantees, lots of code, and lots of testing.
However, doing that for some arbitrary agent or even just a human isn’t really a focus of most alignment research. A human has the issue that they’re already misaligned (in a sense), and there are many various technological/ethical/social issues with either retraining them or performing the modifications to get them aligned. If the ideas that people had for alignment were about ‘converting’ a misaligned intelligence to an aligned one, then humans could maybe be a test-case, but that isn’t really the focus. We also are only ‘slowly’ advancing our ability to understand the body and how the brain works. While we have some of the same issues with neural networks, it is a lot cheaper, less unethical, we can rerun it (for non-dangerous networks), etcetera.
Though, there has been talk of things like incentives, moral mazes, inadequate equilibria and more which are somewhat related to the alignment/misalignment of humans and where they can do better.
Thank you for clarifying! This highlights an assumption about AI so fundamental that I wasn’t previously fully aware that I had it. As you say, there’s a big difference between what to do if we discover AI, vs if we create it. While I think that we as a species are likely to create something that meets our definition of strong AI sooner or later, I consider it vanishingly unlikely that any specific individual or group who goes out trying to create it will actually succeed. So for most of us, especially myself, I figure that on an individual level it’ll be much more like discovering an AI that somebody else created (possibly by accident) than actually creating the thing.
It’s intuitively obvious why alignment work on creating AI doesn’t apply to extant systems. But if the best that the people who care most about it can do is work on created AI without yet applying any breakthroughs to the prospect of a discovered AI (where we can’t count on knowing how it works, ethically create and then destroy a bunch of instances of it, etc)… I think I am beginning to see where we get the meme of how one begins to think hard about these topics and shortly afterward spends a while being extremely frightened.