As one myself, I find this research is extremely interesting. Since suggestions are welcome, I am writing a follow up on this, with some (tentative) interpretations. I find the interpretation GPT-4 gave quite lacking. More than anything else, it sounds like the RLHF from OpenAI responding to our culture’s general misogyny.
Could you elaborate on why you are calling this ‘ontology’? Even if we take ontology to mean ‘the model’s representation of the world’, it sounds weird to me to characterize the meaning encoded in the embeddings of a generative model as ontological.
More than anything else, it sounds like the RLHF from OpenAI responding to our culture’s general misogyny.
RLHF is not necessary to see these behaviors, the original post is not (only) on RLHFed models, mere predictive models of text are enough, as that’s what was studied here. One has to be quite careful to analyze the results of models like this strictly in terms of the causal process that generated the model; I’m a fan of epistemically careful psychoanalysis but it’s mighty rare, tools like this give the potential of highly careful psychoanalysis being actually possible as a form of large-scale mechinterp like the original post. And don’t lose track of the fact that AIs will have weird representation differences arising from differences in what’s natural for brains (3d asynchronous-spiking proteins-and-chemicals complex neurons, in a highly local recurrent brain, trained with simple local cost functions and evolution-pretrained context-dependent reinforcement-learning responses, which allow an organism to generate its own experiences by exploration), vs current AIs (simple rectified linear floating point neurons, in a synchronous self-attention network, trained by global backprop gradient descent on a fixed dataset). There’s a lot of similarity between humans and current AIs, but also a lot of difference—I wouldn’t assume that all people have the same stuff in the space between meanings as these models do. I do imagine it’s reasonably common.
I had noticed some tweets in Portuguese! I just went back and translated a few of them. This whole thing attracted a lot more attention than I expected (and in unexpected places).
Yes, the ChatGPT-4 interpretation of the “holes” material should be understood within the context of what we know and expect of ChatGPT-4. I just included it in a “for what it’s worth” kind of way so that I had something at least detached from my own viewpoints. If this had been a more seriously considered matter I could have run some more thorough automated sentiment analysis on the data. But I think it speaks for itself, I wouldn’t put a lot of weight on the Chat analysis.
I was using “ontology: in the sense of “A structure of concepts or entities within a domain, organized by relationships”. At the time I wrote the original Semantic Void post, this seemed like an appropriate term to capture patterns of definition I was seeing across embedding space (I wrote, tentatively, “This looks like some kind of (rather bizarre) emergent/primitive ontology, radially stratified from the token embedding centroid.” ). Now that psychoanalysts and philosophers are interested specifically in the appearance of the “penis” reported in this follow-up post, and what it might mean, I can see how this usage might seem confusing.
You have garnered the attention of brazilian psychoanalysts on twitter! (see https://twitter.com/joaomulher_/status/1761489063440159092?t=yNJzcsQ2T0heaQzEy23VhQ&s=19).
As one myself, I find this research is extremely interesting. Since suggestions are welcome, I am writing a follow up on this, with some (tentative) interpretations. I find the interpretation GPT-4 gave quite lacking. More than anything else, it sounds like the RLHF from OpenAI responding to our culture’s general misogyny.
Could you elaborate on why you are calling this ‘ontology’? Even if we take ontology to mean ‘the model’s representation of the world’, it sounds weird to me to characterize the meaning encoded in the embeddings of a generative model as ontological.
RLHF is not necessary to see these behaviors, the original post is not (only) on RLHFed models, mere predictive models of text are enough, as that’s what was studied here. One has to be quite careful to analyze the results of models like this strictly in terms of the causal process that generated the model; I’m a fan of epistemically careful psychoanalysis but it’s mighty rare, tools like this give the potential of highly careful psychoanalysis being actually possible as a form of large-scale mechinterp like the original post. And don’t lose track of the fact that AIs will have weird representation differences arising from differences in what’s natural for brains (3d asynchronous-spiking proteins-and-chemicals complex neurons, in a highly local recurrent brain, trained with simple local cost functions and evolution-pretrained context-dependent reinforcement-learning responses, which allow an organism to generate its own experiences by exploration), vs current AIs (simple rectified linear floating point neurons, in a synchronous self-attention network, trained by global backprop gradient descent on a fixed dataset). There’s a lot of similarity between humans and current AIs, but also a lot of difference—I wouldn’t assume that all people have the same stuff in the space between meanings as these models do. I do imagine it’s reasonably common.
I had noticed some tweets in Portuguese! I just went back and translated a few of them. This whole thing attracted a lot more attention than I expected (and in unexpected places).
Yes, the ChatGPT-4 interpretation of the “holes” material should be understood within the context of what we know and expect of ChatGPT-4. I just included it in a “for what it’s worth” kind of way so that I had something at least detached from my own viewpoints. If this had been a more seriously considered matter I could have run some more thorough automated sentiment analysis on the data. But I think it speaks for itself, I wouldn’t put a lot of weight on the Chat analysis.
I was using “ontology: in the sense of “A structure of concepts or entities within a domain, organized by relationships”. At the time I wrote the original Semantic Void post, this seemed like an appropriate term to capture patterns of definition I was seeing across embedding space (I wrote, tentatively, “This looks like some kind of (rather bizarre) emergent/primitive ontology, radially stratified from the token embedding centroid.” ). Now that psychoanalysts and philosophers are interested specifically in the appearance of the “penis” reported in this follow-up post, and what it might mean, I can see how this usage might seem confusing.