To answer the question from an AI Alignment optimist perspective, much of the way humans are aligned is something like RLHF, but currently, a lot of human alignment techniques rely on the assumption that no one has vastly divergent capabilities, especially in IQ or the g-factor. It’s a good thing from our perspective that the difference in between a species is way more bounded than the differences between species.
That’s the real problem of AI, in that there’s a non-trivial chance that this assumption breaks, and that’s the difference between AI Alignment and other forms of alignment.
So in a sense, I disagree with Turntrout on what would happen in practice if we allowed humans to scale their abilities via say genetic engineering.
The reason I’m optimistic is that I don’t think this assumption has to be true, and while the Thatcher’s Axiom post implies limits on how much we can expect society to be aligned with itself, it might be much larger than we think.
Pretraining from Human Feedback is one of the first alignment methods that scales well with data, and I suspect it will also scale well with other capabilities.
Basically it does alignment how it should be done, align it first, then give it capabilities.
It almost completely solves the major issue of inner alignment, in that we found an objective that is quite simple and myopic, and this means we almost completely avoid deceptive alignment, even if we do online training later or give it a writable memory.
It also has a number of outer alignment benefits for the goal, in that the AI can’t affect it’s own training distribution or gradient hack, thus we can recreate a Cartesian boundary that works in the embedded setting.
So in conclusion, I’m more optimistic than TurnTrout or Quintin Pope, but via a different method.
Edit: Almost the entire section down from “The reason I’m optimistic” is a view I no longer hold, and I have become somewhat more pessimistic since this comment.
I don’t believe that a single human being of any level of intelligence could be an x-risk. Happy to debate this point further since I think it is a crux. (Note that I do not believe that a plague could lead to human extinction. Plagues don’t kill 100%.)
AIs are different because a single monolithic AI, or a team of self-aligned AIs, could do things on the scale of an institution, things such as technological breakthroughs (nano), controlling superpower-scale military forces, mass information control that would make Orwell blush, etc. An individual human could never do such things no matter how big his skull was, unless he was hooked up to an AI, in which case it’s not the human that is super intelligent.
Never is a long time. I overall agree with your statement in this comment except for the word ‘never’. I would say, “An individual human currently can’t do such things...”
The key point here is that the technological barriers to x-risks may change in the future. If we do invent powerful nanotech, or substantially advanced genetic engineering techniques & tools, or vastly cheaper and more powerful weapons of some sort, then it may be the case that the barrier-to-entry for causing an x-risk is substantially lower. And thus, what is current impossible for any human may become possible for some or all humans.
Not saying this will happen, just saying that it could.
Of the three examples I gave, inventing nanotech is the most plausible for our galaxy-brained man, and I suppose meta-Einstein might be able to solve nanotech in his head. However, almost certainly in our timeline nanotech will be solved either by a team of humans or (much more likely at this point) AI. I expect that even ASI will need at least some time in the wetlab to experiment.
The other two examples I gave certainly could not be done by a single human without a brain implant.
I’m also thinking that is the not the meaningful of a debate (at least to me) since in 2023 I think we can reasonably predict that humans will not genetically engineer galaxy brains before the AI revolution resolves.
I don’t believe that a single human being of any level of intelligence could be an x-risk. Happy to debate this point further since I think it is a crux.
It’s partially a crux, but the issue I’m emphasizing is the distribution of capabilities. If things are normally distributed, which seems to be the case in humans, with small corrections, than we can essentially bound how much impact a single or well dedicated team of
misaligned humans can have in overthrowing the aligned order. In particular, this makes a lot more non-scalable heuristics basically work.
If it’s something closer to a power law distribution, perhaps as a result of NGVUD technology (The acronym stands for nanotechnology, genetic engineering, virtual reality, uploading and downloading technology), than you have to have a defense that scales, and without potentially radical changes, such a world would most likely end in the victory of a small team of misaligned humans due to vast capabilities differentials, similar to how many animal species have went extinct as a result of human activity.
AIs are different because a single monolithic AI, or a team of self-aligned AIs, could do things on the scale of an institution, things such as technological breakthroughs (nano), controlling superpower-scale military forces, mass information control that would make Orwell blush, etc. An individual human could never do such things no matter how big his skull was, unless he was hooked up to an AI, in which case it’s not the human that is super intelligent.
Hm, I agree that in practice, AI will be better than humans at various tasks, but I believe this is mostly due to quantitative factors, and if we allow ourselves to make the brain as big as necessary, we could be superintelligent too.
To answer the question from an AI Alignment optimist perspective, much of the way humans are aligned is something like RLHF, but currently, a lot of human alignment techniques rely on the assumption that no one has vastly divergent capabilities, especially in IQ or the g-factor. It’s a good thing from our perspective that the difference in between a species is way more bounded than the differences between species.
That’s the real problem of AI, in that there’s a non-trivial chance that this assumption breaks, and that’s the difference between AI Alignment and other forms of alignment.
So in a sense, I disagree with Turntrout on what would happen in practice if we allowed humans to scale their abilities via say genetic engineering.
The reason I’m optimistic is that I don’t think this assumption has to be true, and while the Thatcher’s Axiom post implies limits on how much we can expect society to be aligned with itself, it might be much larger than we think.
Pretraining from Human Feedback is one of the first alignment methods that scales well with data, and I suspect it will also scale well with other capabilities.
Basically it does alignment how it should be done, align it first, then give it capabilities.
It almost completely solves the major issue of inner alignment, in that we found an objective that is quite simple and myopic, and this means we almost completely avoid deceptive alignment, even if we do online training later or give it a writable memory.
It also has a number of outer alignment benefits for the goal, in that the AI can’t affect it’s own training distribution or gradient hack, thus we can recreate a Cartesian boundary that works in the embedded setting.
So in conclusion, I’m more optimistic than TurnTrout or Quintin Pope, but via a different method.
Edit: Almost the entire section down from “The reason I’m optimistic” is a view I no longer hold, and I have become somewhat more pessimistic since this comment.
I don’t believe that a single human being of any level of intelligence could be an x-risk. Happy to debate this point further since I think it is a crux. (Note that I do not believe that a plague could lead to human extinction. Plagues don’t kill 100%.)
AIs are different because a single monolithic AI, or a team of self-aligned AIs, could do things on the scale of an institution, things such as technological breakthroughs (nano), controlling superpower-scale military forces, mass information control that would make Orwell blush, etc. An individual human could never do such things no matter how big his skull was, unless he was hooked up to an AI, in which case it’s not the human that is super intelligent.
Never is a long time. I overall agree with your statement in this comment except for the word ‘never’. I would say, “An individual human currently can’t do such things...”
The key point here is that the technological barriers to x-risks may change in the future. If we do invent powerful nanotech, or substantially advanced genetic engineering techniques & tools, or vastly cheaper and more powerful weapons of some sort, then it may be the case that the barrier-to-entry for causing an x-risk is substantially lower. And thus, what is current impossible for any human may become possible for some or all humans.
Not saying this will happen, just saying that it could.
Of the three examples I gave, inventing nanotech is the most plausible for our galaxy-brained man, and I suppose meta-Einstein might be able to solve nanotech in his head. However, almost certainly in our timeline nanotech will be solved either by a team of humans or (much more likely at this point) AI. I expect that even ASI will need at least some time in the wetlab to experiment.
The other two examples I gave certainly could not be done by a single human without a brain implant.
I’m also thinking that is the not the meaningful of a debate (at least to me) since in 2023 I think we can reasonably predict that humans will not genetically engineer galaxy brains before the AI revolution resolves.
It’s partially a crux, but the issue I’m emphasizing is the distribution of capabilities. If things are normally distributed, which seems to be the case in humans, with small corrections, than we can essentially bound how much impact a single or well dedicated team of misaligned humans can have in overthrowing the aligned order. In particular, this makes a lot more non-scalable heuristics basically work.
If it’s something closer to a power law distribution, perhaps as a result of NGVUD technology (The acronym stands for nanotechnology, genetic engineering, virtual reality, uploading and downloading technology), than you have to have a defense that scales, and without potentially radical changes, such a world would most likely end in the victory of a small team of misaligned humans due to vast capabilities differentials, similar to how many animal species have went extinct as a result of human activity.
Hm, I agree that in practice, AI will be better than humans at various tasks, but I believe this is mostly due to quantitative factors, and if we allow ourselves to make the brain as big as necessary, we could be superintelligent too.