TurnTrout comments on Many arguments for AI x-risk are wrong

TurnTrout 11 Mar 2024 21:30 UTC
LW: 7 AF: 6
0
AF
To add, here’s an excerpt from the Q&A on How likely is deceptive alignment? :
Question: When you say model space, you mean the functional behavior as opposed to the literal parameter space?
Evan: So there’s not quite a one to one mapping because there are multiple implementations of the exact same function in a network. But it’s pretty close. I mean, most of the time when I’m saying model space, I’m talking either about the weight space or about the function space where I’m interpreting the function over all inputs, not just the training data.
I only talk about the space of functions restricted to their training performance for this path dependence concept, where we get this view where, well, they end up on the same point, but we want to know how much we need to know about how they got there to understand how they generalize.