Do you think the AI-assisted humanity is in a worse situation than humanity is today? If we are metaphilosophically competent enough that we can make progress, why won’t we remain metaphilosphically competent enough once we have powerful AI assistants?
In your hypothetical in particular, why do the people in the future—who have had radically more subjective time to consider this problem than we have, have apparently augmented their intelligence, and have exchanged massive amounts of knowledge with each other—make decisions so much worse than those that you or I would make today?
From your story it seems like your position is something like:
Humanity is only likely to reach a good outcome because technological constraints force us to continue thinking rather than doing anything irreversible. Removing technological constraints and allowing humans to get what they short-term-want will be bad, because most humans don’t have a short-term preference for deliberation, and many of the things they are likely to do would incidentally but permanently close off the prospect of future course corrections (or lead to value drift).
Is that an accurate characterization?
Other than disagreeing, my main complaint is that this doesn’t seem to have much to do with AI. Couldn’t you tell exactly the same story about human civilization proceeding along its normal development trajectory, never building an AI, but gradually uncovering new technologies and becoming smarter?
I think the relevance to AI is that AI might accelerate other kinds of progress more than it accelerates deliberation. But you don’t mention that here, so it doesn’t seem like what you have in mind. And at any rate, that seems like a separate problem from alignment, which really needs to be solved by different mechanisms. “Corrigibility” isn’t really a relevant concept when addressing that problem.
It seems to me like a potential hidden assumption, is whether AGI is the last invention humanity will ever need to make. In the standard Bostromian/Yudkowskian paradigms, we create the AGI, then the AGI becomes a singleton that determines the fate of the universe, and humans have no more input (so we’d better get it right). Whereas the emphasis of approval-directed agents is that we humans will continue to be the deciders, we’ll just have greatly augmented capability.
Do you think the AI-assisted humanity is in a worse situation than humanity is today? If we are metaphilosophically competent enough that we can make progress, why won’t we remain metaphilosphically competent enough once we have powerful AI assistants?
Depends on who “we” is. If the first team that builds an AGI achieves a singleton, then I think the outcome is good if and only if the people on that team are metaphilosophically competent enough, and don’t have that competence corrupted by AI’s.
In your hypothetical in particular, why do the people in the future—who have had radically more subjective time to consider this problem than we have, have apparently augmented their intelligence, and have exchanged massive amounts of knowledge with each other—make decisions so much worse than those that you or I would make today?
If the team in the hypothetical is less metaphilosophically competent than we are, or have their metaphilosophical competence corrupted by the AI, then their decisions would turn out worse.
I would say so. Another fairly significant component is my model that humanity makes updates by having enough powerful people paying enough attention to reasonable people, enough other powerful people paying attention to those powerful people, and with everyone else roughly copying the beliefs of the powerful people. So, good memes --> reasonable people --> some powerful people --> other powerful people --> everyone else.
AI would make some group of people far more powerful than the rest, which screws up the chain if that group don’t pay much attention to reasonable people. In that case, they (and the world) might just never become reasonable. I think this would happen if ISIS took control, for example.
Other than disagreeing, my main complaint is that this doesn’t seem to have much to do with AI. Couldn’t you tell exactly the same story about human civilization proceeding along its normal development trajectory, never building an AI, but gradually uncovering new technologies and becoming smarter?
I would indeed expect this by default, particularly if one group with one ideology attains decisive control over the world. But if we somehow manage to avoid that (which seems unlikely to me, given the nature of technological progress), I feel much more optimistic about metaphilosophy continuing to progress and propagate throughout humanity relatively quickly.
When I talk about alignment I’m definitely talking about a narrower thing than you. In particular, any difficulties that would exist with or without AI *aren’t* part of what I mean by AI alignment.
Do you think the AI-assisted humanity is in a worse situation than humanity is today?
Lots of people involved in thinking about AI seem to be in a zero sum, winner-take-all mode. E.g. Macron.
I think there will be significant founder effects from the strategies of the people that create AGI. The development of AGI will be used as an example of what types of strategies win in the future during technological development. Deliberation may tell people that there are better equilibrium. But empiricism may tell people that they are too hard to reach.
Currently the positive-sum norm of free exchange of scientific knowledge is being tested. For good reasons, perhaps? But I worry for the world if lack of sharing of knowledge gets cemented as the new norm. It will lead to more arms races and make coordination harder on the important problems. So if the creation of AI leads to the destruction of science as we know it, I think we might be in a worse position.
I, perhaps naively, don’t think it has to be that way.
I think the relevance to AI is that AI might accelerate other kinds of progress more than it accelerates deliberation. [...] And at any rate, that seems like a separate problem from alignment, which really needs to be solved by different mechanisms.
What if the mechanism for solving alignment itself causes differential intellectual progress (in the wrong direction)? For example, suppose IDA makes certain kinds of progress easier than others, compared to no AI, or compared to another AI that’s designed based on a different approach to AI alignment. If that’s the case, it seems that we have to solve alignment (in your narrow sense) and differential intellectual progress at the same time instead of through independent mechanisms. An exception might be if we had some independent solution to differential intellectual progress that can totally overpower whatever influence AI design has on it. Is that what you are expecting?
Do you think the AI-assisted humanity is in a worse situation than humanity is today? If we are metaphilosophically competent enough that we can make progress, why won’t we remain metaphilosphically competent enough once we have powerful AI assistants?
In your hypothetical in particular, why do the people in the future—who have had radically more subjective time to consider this problem than we have, have apparently augmented their intelligence, and have exchanged massive amounts of knowledge with each other—make decisions so much worse than those that you or I would make today?
From your story it seems like your position is something like:
Is that an accurate characterization?
Other than disagreeing, my main complaint is that this doesn’t seem to have much to do with AI. Couldn’t you tell exactly the same story about human civilization proceeding along its normal development trajectory, never building an AI, but gradually uncovering new technologies and becoming smarter?
I think the relevance to AI is that AI might accelerate other kinds of progress more than it accelerates deliberation. But you don’t mention that here, so it doesn’t seem like what you have in mind. And at any rate, that seems like a separate problem from alignment, which really needs to be solved by different mechanisms. “Corrigibility” isn’t really a relevant concept when addressing that problem.
It seems to me like a potential hidden assumption, is whether AGI is the last invention humanity will ever need to make. In the standard Bostromian/Yudkowskian paradigms, we create the AGI, then the AGI becomes a singleton that determines the fate of the universe, and humans have no more input (so we’d better get it right). Whereas the emphasis of approval-directed agents is that we humans will continue to be the deciders, we’ll just have greatly augmented capability.
I don’t see those as incompatible. A singleton can take input from humans.
Depends on who “we” is. If the first team that builds an AGI achieves a singleton, then I think the outcome is good if and only if the people on that team are metaphilosophically competent enough, and don’t have that competence corrupted by AI’s.
If the team in the hypothetical is less metaphilosophically competent than we are, or have their metaphilosophical competence corrupted by the AI, then their decisions would turn out worse.
I’m reminded of the lengthy discussion you had with Wei Dai back in the day. I share his picture of which scenarios will get us something close to optimal, his beliefs that philosophical ignorance might persist indefinitely, his skepticism about the robustness of human reflection, and his skepticism that human values will robustly converge upon reflection.
I would say so. Another fairly significant component is my model that humanity makes updates by having enough powerful people paying enough attention to reasonable people, enough other powerful people paying attention to those powerful people, and with everyone else roughly copying the beliefs of the powerful people. So, good memes --> reasonable people --> some powerful people --> other powerful people --> everyone else.
AI would make some group of people far more powerful than the rest, which screws up the chain if that group don’t pay much attention to reasonable people. In that case, they (and the world) might just never become reasonable. I think this would happen if ISIS took control, for example.
I would indeed expect this by default, particularly if one group with one ideology attains decisive control over the world. But if we somehow manage to avoid that (which seems unlikely to me, given the nature of technological progress), I feel much more optimistic about metaphilosophy continuing to progress and propagate throughout humanity relatively quickly.
When I talk about alignment I’m definitely talking about a narrower thing than you. In particular, any difficulties that would exist with or without AI *aren’t* part of what I mean by AI alignment.
Lots of people involved in thinking about AI seem to be in a zero sum, winner-take-all mode. E.g. Macron.
I think there will be significant founder effects from the strategies of the people that create AGI. The development of AGI will be used as an example of what types of strategies win in the future during technological development. Deliberation may tell people that there are better equilibrium. But empiricism may tell people that they are too hard to reach.
Currently the positive-sum norm of free exchange of scientific knowledge is being tested. For good reasons, perhaps? But I worry for the world if lack of sharing of knowledge gets cemented as the new norm. It will lead to more arms races and make coordination harder on the important problems. So if the creation of AI leads to the destruction of science as we know it, I think we might be in a worse position.
I, perhaps naively, don’t think it has to be that way.
What if the mechanism for solving alignment itself causes differential intellectual progress (in the wrong direction)? For example, suppose IDA makes certain kinds of progress easier than others, compared to no AI, or compared to another AI that’s designed based on a different approach to AI alignment. If that’s the case, it seems that we have to solve alignment (in your narrow sense) and differential intellectual progress at the same time instead of through independent mechanisms. An exception might be if we had some independent solution to differential intellectual progress that can totally overpower whatever influence AI design has on it. Is that what you are expecting?