I am also worried about where ill-considered regulation could take us. I think the best hopes for alignment all start by using imitation learning to clone human-like behavior. Broad limitations on what sorts of human-produced data are usable for training will likely make the behavior cloning process less robust and make it less likely to transmit subtler dimensions of human values/cognition to the AI.
Imitation learning is the primary mechanism by which we transmit human values to current state of the art language models. Greatly restricting the pool of people whose outputs can inform the AI’s instantiation of values is both risky and (IMO) potentially unfair, since it denies many people of the opportunity for their values to influence the behaviors of the first transformative AI systems.
I would also add that the premise of Katja’s argument seems like a pretty thin strawman of the opposition:
Some people: AI might kill everyone. We should design a godlike super-AI of perfect goodness to prevent that.
Others: wow that sounds extremely ambitious
Some people: yeah but it’s very important and also we are extremely smart so idk it could work
[Work on it for a decade and a half] Some people: ok that’s pretty hard, we give up
Others: oh huh shouldn’t we maybe try to stop the building of this dangerous AI?
Some people: hmm, that would involve coordinating numerous people—we may be arrogant enough to think that we might build a god-machine that can take over the world and remake it as a paradise, but we aren’t delusional
I am also worried about where ill-considered regulation could take us. I think the best hopes for alignment all start by using imitation learning to clone human-like behavior. Broad limitations on what sorts of human-produced data are usable for training will likely make the behavior cloning process less robust and make it less likely to transmit subtler dimensions of human values/cognition to the AI.
Imitation learning is the primary mechanism by which we transmit human values to current state of the art language models. Greatly restricting the pool of people whose outputs can inform the AI’s instantiation of values is both risky and (IMO) potentially unfair, since it denies many people of the opportunity for their values to influence the behaviors of the first transformative AI systems.
I would also add that the premise of Katja’s argument seems like a pretty thin strawman of the opposition: