I think this is extremely misleading. Firstly, real-world data in high dimensions basically never look like spheres. Such data almost always cluster in extremely compact manifolds, whose internal volume is minuscule compared to the full volume of the space they’re embedded in.
I agree with your picture of how manifolds work; I don’t think it actually disagrees all that much with Yudkowsky’s.
That is, the thing where all humans are basically the same make and model of car, running the same brand of engine, painted different colors is the claim that the intrinsic dimension of human minds is pretty small. (Taken literally, it’s 3, for the three dimensions of color-space.)
And so if you think there are, say, 40 intrinsic dimensions to mind-space, and humans are fixed on 37 of the points and variable on the other 3, well, I think we have basically the Yudkowskian picture.
(I agree if Yudkowsky’s picture was that there were 40M dimensions and humans varied on 3, this would be comically wrong, but I don’t think this is what he’s imagining for that argument.)
Addressing this objection is why I emphasized the relatively low information content that architecture / optimizers provide for minds, as compared to training data. We’ve gotten very far in instantiating human-like behaviors by training networks on human-like data. I’m saying the primacy of data for determining minds means you can get surprisingly close in mindspace, as compared to if you thought architecture / optimizer / etc were the most important.
Obviously, there are still huge gaps between the sorts of data that an LLM is trained on versus the implicit loss functions human brains actually minimize, so it’s kind of surprising we’ve even gotten this far. The implication I’m pointing to is that it’s feasible to get really close to human minds along important dimensions related to values and behaviors, even without replicating all the quirks of human mental architecture.
I agree with your picture of how manifolds work; I don’t think it actually disagrees all that much with Yudkowsky’s.
That is, the thing where all humans are basically the same make and model of car, running the same brand of engine, painted different colors is the claim that the intrinsic dimension of human minds is pretty small. (Taken literally, it’s 3, for the three dimensions of color-space.)
And so if you think there are, say, 40 intrinsic dimensions to mind-space, and humans are fixed on 37 of the points and variable on the other 3, well, I think we have basically the Yudkowskian picture.
(I agree if Yudkowsky’s picture was that there were 40M dimensions and humans varied on 3, this would be comically wrong, but I don’t think this is what he’s imagining for that argument.)
Addressing this objection is why I emphasized the relatively low information content that architecture / optimizers provide for minds, as compared to training data. We’ve gotten very far in instantiating human-like behaviors by training networks on human-like data. I’m saying the primacy of data for determining minds means you can get surprisingly close in mindspace, as compared to if you thought architecture / optimizer / etc were the most important.
Obviously, there are still huge gaps between the sorts of data that an LLM is trained on versus the implicit loss functions human brains actually minimize, so it’s kind of surprising we’ve even gotten this far. The implication I’m pointing to is that it’s feasible to get really close to human minds along important dimensions related to values and behaviors, even without replicating all the quirks of human mental architecture.