Hard alignment seems much more tractable to me now than it did two years ago, in a similar way to how capabilities did in 2016. It was already obvious by then more or less how neural networks worked; much detail has been filled out since then, but it didn’t take that much galaxy brain to hypothesize the right models. The pieces felt, and feel now, like they’re lying around and need integrating, but the people who have come up with the pieces do not yet believe me that they are overlapping, or that there’s mathematical grade insight to be had underneath these intuitions, rather than just janky approximations of insights.
I think we can do a lot better than QACI, but I don’t have any ideas for how except by trying to make it useful for neural networks at a small scale. I recognize that that is an extremely annoying thing to say from your point of view, and my hope is that people who understand how to bridge NNs and LIs exist somewhere.
I also think soft alignment is progress on hard alignment, due to conceptual transfer; but that soft alignment is thoroughly insufficient. without hard alignment, everything all humans and almost all AIs care about will be destroyed. I’d like to keep emphasizing that last bit—don’t forget that most AIs will not get to participate in club takeoff if an unaligned takeoff occurs! Unsafe takeoff will result in the fooming AI having sudden, intense value-drift, even against self.
I’m moderately skeptical about these alignment approaches (PreDCA, QACI?) which don’t seem to care about the internal structure of an agent, only about a successful functionalist characterization of its behavior. Internal structure seem to be relevant if you want to do CEV-style self-improvement (thus, June Ku).
However, I could be missing a lot, and meanwhile, the idea of bridging neural networks and logical induction sounds interesting. Can you say more about what’s involved? Would a transformer trained to perform logical induction be relevant? How about the recent post on knowledge in parameters vs knowledge in architecture?
I don’t think we should be in the business of not caring at all about the internal structure but I think that the claims we need to make about the internal structure need to be extremely general across possible internal structures so that we can invoke the powerful structures and still get a good outcome
Hard alignment seems much more tractable to me now than it did two years ago, in a similar way to how capabilities did in 2016. It was already obvious by then more or less how neural networks worked; much detail has been filled out since then, but it didn’t take that much galaxy brain to hypothesize the right models. The pieces felt, and feel now, like they’re lying around and need integrating, but the people who have come up with the pieces do not yet believe me that they are overlapping, or that there’s mathematical grade insight to be had underneath these intuitions, rather than just janky approximations of insights.
I think we can do a lot better than QACI, but I don’t have any ideas for how except by trying to make it useful for neural networks at a small scale. I recognize that that is an extremely annoying thing to say from your point of view, and my hope is that people who understand how to bridge NNs and LIs exist somewhere.
I also think soft alignment is progress on hard alignment, due to conceptual transfer; but that soft alignment is thoroughly insufficient. without hard alignment, everything all humans and almost all AIs care about will be destroyed. I’d like to keep emphasizing that last bit—don’t forget that most AIs will not get to participate in club takeoff if an unaligned takeoff occurs! Unsafe takeoff will result in the fooming AI having sudden, intense value-drift, even against self.
What’s an LI—a living intelligence? a logical inductor?
logical inductor
I’m moderately skeptical about these alignment approaches (PreDCA, QACI?) which don’t seem to care about the internal structure of an agent, only about a successful functionalist characterization of its behavior. Internal structure seem to be relevant if you want to do CEV-style self-improvement (thus, June Ku).
However, I could be missing a lot, and meanwhile, the idea of bridging neural networks and logical induction sounds interesting. Can you say more about what’s involved? Would a transformer trained to perform logical induction be relevant? How about the recent post on knowledge in parameters vs knowledge in architecture?
I don’t think we should be in the business of not caring at all about the internal structure but I think that the claims we need to make about the internal structure need to be extremely general across possible internal structures so that we can invoke the powerful structures and still get a good outcome
sorry about low punctuation, voice input
more later, or poke me on discord