Different parts of me get excited about this in different directions.
On the one hand, I see AI alignment as highly solvable. When I scan out among a dozen different subdisciplines in machine learning, generative modeling, natural language processing, cognitive science, computational neuroscience, predictive coding, etc., I feel like I can sense the faint edges of a solution to alignment that is already holographically distributed among collective humanity.
Getting AGI that has the same natural abstractions that biological brains converge on, that uses interpretable computational architectures for explicit reasoning, that continuously improves its internal predictive models of the needs and goals of other agents within its sphere of control and uses these models to motivate its own behavior in a self-correcting loop of corrigibility, that cares about the long-term survival of humanity and the whole biosphere; all of this seems like it is achievable within the next 10-20 years if we could just get all the right people working together on it. And I’m excited at the prospect that we could be part of seeing this vision come to fruition.
On the other hand, I realize that humanity is full of bad faith actors and otherwise good people whose agendas are constrained by perverse local incentives. Right now, deep learning is prone to fall to adversarial examples, completely failing to recognize what it’s looking at when the texture changes slightly. Natural language understanding is still brittle, with transformer models probably being a bit too general-purpose for their own good. Reinforcement learning still falls prey to Goodharting, which would almost certainly lead to disaster if scaled up sufficiently. Honestly, I don’t want to see an AGI emerge that’s based on current paradigms just hacked together into something that seems to work. But I see groups moving in that direction anyway.
Without an alignment-adjacent paradigm shift that offers competitive performance over existing models, the major developers of AI are going to continue down a dangerous path, while no one else has the resources to compete. In this light, seeing the rapid progress of the last decade from Alex-Net to GPT-3 and DALLE-2 creates the sort of foreboding excitement that you talked about here. The train is barreling forward at an accelerating pace, and reasonable voices may not be loud enough over the roar of the engines to get the conductor to switch tracks before we plunge over a cliff.
I’m excited for the possibilities of AGI as I idealize it. I’m dreading the likelihood of a dystopic future with no escape if existing AI paradigms take over the world. The question becomes, how do we switch tracks?
Different parts of me get excited about this in different directions.
On the one hand, I see AI alignment as highly solvable. When I scan out among a dozen different subdisciplines in machine learning, generative modeling, natural language processing, cognitive science, computational neuroscience, predictive coding, etc., I feel like I can sense the faint edges of a solution to alignment that is already holographically distributed among collective humanity.
Getting AGI that has the same natural abstractions that biological brains converge on, that uses interpretable computational architectures for explicit reasoning, that continuously improves its internal predictive models of the needs and goals of other agents within its sphere of control and uses these models to motivate its own behavior in a self-correcting loop of corrigibility, that cares about the long-term survival of humanity and the whole biosphere; all of this seems like it is achievable within the next 10-20 years if we could just get all the right people working together on it. And I’m excited at the prospect that we could be part of seeing this vision come to fruition.
On the other hand, I realize that humanity is full of bad faith actors and otherwise good people whose agendas are constrained by perverse local incentives. Right now, deep learning is prone to fall to adversarial examples, completely failing to recognize what it’s looking at when the texture changes slightly. Natural language understanding is still brittle, with transformer models probably being a bit too general-purpose for their own good. Reinforcement learning still falls prey to Goodharting, which would almost certainly lead to disaster if scaled up sufficiently. Honestly, I don’t want to see an AGI emerge that’s based on current paradigms just hacked together into something that seems to work. But I see groups moving in that direction anyway.
Without an alignment-adjacent paradigm shift that offers competitive performance over existing models, the major developers of AI are going to continue down a dangerous path, while no one else has the resources to compete. In this light, seeing the rapid progress of the last decade from Alex-Net to GPT-3 and DALLE-2 creates the sort of foreboding excitement that you talked about here. The train is barreling forward at an accelerating pace, and reasonable voices may not be loud enough over the roar of the engines to get the conductor to switch tracks before we plunge over a cliff.
I’m excited for the possibilities of AGI as I idealize it. I’m dreading the likelihood of a dystopic future with no escape if existing AI paradigms take over the world. The question becomes, how do we switch tracks?