I’m moderately skeptical about these alignment approaches (PreDCA, QACI?) which don’t seem to care about the internal structure of an agent, only about a successful functionalist characterization of its behavior. Internal structure seem to be relevant if you want to do CEV-style self-improvement (thus, June Ku).
However, I could be missing a lot, and meanwhile, the idea of bridging neural networks and logical induction sounds interesting. Can you say more about what’s involved? Would a transformer trained to perform logical induction be relevant? How about the recent post on knowledge in parameters vs knowledge in architecture?
I don’t think we should be in the business of not caring at all about the internal structure but I think that the claims we need to make about the internal structure need to be extremely general across possible internal structures so that we can invoke the powerful structures and still get a good outcome
I’m moderately skeptical about these alignment approaches (PreDCA, QACI?) which don’t seem to care about the internal structure of an agent, only about a successful functionalist characterization of its behavior. Internal structure seem to be relevant if you want to do CEV-style self-improvement (thus, June Ku).
However, I could be missing a lot, and meanwhile, the idea of bridging neural networks and logical induction sounds interesting. Can you say more about what’s involved? Would a transformer trained to perform logical induction be relevant? How about the recent post on knowledge in parameters vs knowledge in architecture?
I don’t think we should be in the business of not caring at all about the internal structure but I think that the claims we need to make about the internal structure need to be extremely general across possible internal structures so that we can invoke the powerful structures and still get a good outcome
sorry about low punctuation, voice input
more later, or poke me on discord