I generally disagree with the implicit claim “it’s useful to try aligning AI systems via mechanism design on civilization.” This feels like a vastly clumsier version of trying to shape AGIs via black-box gradient descent.
I didn’t imply that mechanism/higher-level-system design and game theory are alone sufficient for successful alignment. But as a part of a portfolio, I think it’s indispensable.
Probably the degree to which a person (let’s say, you or me) buy into the importance of mechanism/higher-level-system design for AI alignment corresponds to where we also land on the cognitivism—enactivism spectrum. If you are a hardcore cognitivist, you may think that just designing and training AI “in the right way” would be 100% sufficient. If you are a radical enactivist, you probably endorse the opposite view, that just designing incentives is necessary and sufficient, while designing and training AI “in the right way” is futile if the correct incentives are not in place.
I’m personally very uncertain about the cognitivism—enactivism spectrum (as well as, apparently, the scientific community—there are completely opposite positions out there held by respectable cognitive scientists), so from the “meta” rationality perspective, I should hedge bets and be sure not to neglect neither the cognitive representations and AI architecture nor the incentives and the environment.
I also don’t think that realistic pre-AGI efficient markets we can build are aligned with human-CEV by default.
Pre-AGI, I think there won’t be enough collective willingness to upend the economy and institute the “right” structures, anyway. The most realistic path (albeit still with a lot of failure modes) that I see is “alignment MVP tells us (or helps scientists to develop the necessary science) how the society, markets, and governance should really be structured, along with the AI architectures and training procedures for the next-gen AGI, which convinces scientists and decision-makers around the world”.
The difference is that after alignment MVP, first the sceptic voice “Let’s keep things as they always used to be, it’s all hype, there is no intelligence” should definitely cease completely at least among intelligent people. Second, alignment MVP should show that ending scarcity is a real possibility and this should weaken the grip of status quo economic and political incentives. Again, a lot of things could go wrong around this time, but it seems to me this is the path where OpenAI, Conjecture, and probably other AGI labs are aiming because they perceive it as the least risky or the only feasible one.
I’m somewhere in the middle of the cognitivist/enactivist spectrum. I think that e.g. relaxed adversarial training is motivated by trying to make an AI robust to arbitrary inputs it will receive in the world before it leaves the box. I’m sympathetic to the belief that this is computationally intractable; however, it feels more achievable than altering the world in the way I imagine would be necessary without it.
I’m not an idealist here: I think that some civilizational inadequacies should be addressed (e.g., better cooperation and commitment mechanisms) concurrent with in-the-box alignment strategies. My main hope is that we can build an in-the-box corrigible AGI that allows in-deployment modification.
I didn’t imply that mechanism/higher-level-system design and game theory are alone sufficient for successful alignment. But as a part of a portfolio, I think it’s indispensable.
Probably the degree to which a person (let’s say, you or me) buy into the importance of mechanism/higher-level-system design for AI alignment corresponds to where we also land on the cognitivism—enactivism spectrum. If you are a hardcore cognitivist, you may think that just designing and training AI “in the right way” would be 100% sufficient. If you are a radical enactivist, you probably endorse the opposite view, that just designing incentives is necessary and sufficient, while designing and training AI “in the right way” is futile if the correct incentives are not in place.
I’m personally very uncertain about the cognitivism—enactivism spectrum (as well as, apparently, the scientific community—there are completely opposite positions out there held by respectable cognitive scientists), so from the “meta” rationality perspective, I should hedge bets and be sure not to neglect neither the cognitive representations and AI architecture nor the incentives and the environment.
Pre-AGI, I think there won’t be enough collective willingness to upend the economy and institute the “right” structures, anyway. The most realistic path (albeit still with a lot of failure modes) that I see is “alignment MVP tells us (or helps scientists to develop the necessary science) how the society, markets, and governance should really be structured, along with the AI architectures and training procedures for the next-gen AGI, which convinces scientists and decision-makers around the world”.
The difference is that after alignment MVP, first the sceptic voice “Let’s keep things as they always used to be, it’s all hype, there is no intelligence” should definitely cease completely at least among intelligent people. Second, alignment MVP should show that ending scarcity is a real possibility and this should weaken the grip of status quo economic and political incentives. Again, a lot of things could go wrong around this time, but it seems to me this is the path where OpenAI, Conjecture, and probably other AGI labs are aiming because they perceive it as the least risky or the only feasible one.
I’m somewhere in the middle of the cognitivist/enactivist spectrum. I think that e.g. relaxed adversarial training is motivated by trying to make an AI robust to arbitrary inputs it will receive in the world before it leaves the box. I’m sympathetic to the belief that this is computationally intractable; however, it feels more achievable than altering the world in the way I imagine would be necessary without it.
I’m not an idealist here: I think that some civilizational inadequacies should be addressed (e.g., better cooperation and commitment mechanisms) concurrent with in-the-box alignment strategies. My main hope is that we can build an in-the-box corrigible AGI that allows in-deployment modification.