Daniel Kokotajlo comments on A dilemma for prosaic AI alignment

Daniel Kokotajlo Dec 20, 2019, 2:28 PM
LW: 3 AF: 2
AF
Supervised learning has lots of commercial applications, including cases where it competes with humans. The fact that RL doesn’t suggests to me that if you can apply both to a problem, RL is probably an inferior approach.
Good point. New argument: Your argument could have been made in support of GOFAI twenty years ago “Symbol-manipulation programs have had lots of commercial applications, but neural nets have had almost none, therefore the former is a more generally powerful and promising approach to AI than the latter” but not only does it seem wrong in retrospect it was probably not a super powerful argument even then. Analogously, I think we are too early to tell whether RL or supervised learning will be more useful for powerful AI.
Simulation of what? Selection of what? I don’t think those count for my purposes, because they punt the question. (e.g. if you are simulating an agent, then you have an agent-architecture. If you are selecting over things, and the thing you select is an agent...) I think computer program is too general since it includes agent architectures as a subset. These categories are fuzzy of course, so maybe I’m confused, but it still seems to make sense in my head.
(Ah, interesting, it seems that you want to standardize “agent-like architecture” in the opposite of the way that I want to. Perhaps this is underlying our disagreement. I’ll try to follow your definition henceforth, but remember that everything I’ve said previously was with my definition.)
Good point to distinguish between the two. I think that all bullet points, to varying extents, might still qualify as genuine benefits, in the sense that you are talking about. But they might not. It depends on whether there is another policy just as good along the path that the cutting-edge training tends to explore. I agree #2 is probably not like this, but I think #3 might be. (Oh wait, no, it’s your terminology I’m using now… in that case, I’ll say “#3 isn’t an example of agent-like architecture being beneficial to text prediction, but it might well be a case a lower-level architecture exactly like an agent-like architecture except lower level being beneficial to text prediction, supposing that it’s not competitive to predict text except by simulating something like a human writing.”)
I love your idea to generate a list of concrete scenarios of accidentally agency! These 3.5 are my contributions off the top of my head, if I think of more I’ll come back and let you know. And I’d love to see your list if you have a draft somewhere!
I agree the universal prior is malign thing could hurt a non-agent architecture too, and that some agent architectures wouldn’t be susceptible to it. Nevertheless it is an example of how you might get accidentally agency, not in your sense but in my sense: A non-agent architecture could turn out to have an agent as a subcomponent that ends up taking over the behavior at important moments.