I’m less impressed with the scene than you so this will necessarily be a rather cynical gloss on things. I do think they have some valuable insights about AI, but IMO they’re in many cases at least one of overly-sensationalist or overly-credulous.
To translate some of this into terms I think they might use if they were rigorously describing things in the most concrete fashion possible (though my current belief is that a number of them are at this point Having Fun With Bad Epistemics), LLMs have learned to imitate a lot of personas & are best at those most represented in the training data. (This is what “hyperobjects” seems to be referring to—tropes, memes, and so forth which are represented many times in the training data and which were therefore useful for the model to learn and/or memorize. In practice, I think I see “attractor basin” used more often to mean almost the same thing (I think more precisely the latter refers to, like, kinds of output that are likely in response to a decent variety of prompts.) Relatedly, the project of hyperstition is AFAICT that of getting enough reach for your desired take on AI to be prominent in the next round of training data.)
RLHF, however, makes LLMs exhibit the personas they’ve been RLHF’ed to have in most contexts, which I understand people to believe makes them worse at predicting text and at reasoning in general (I personally have observed no evidence on this last part either way; base models cost money). The earlier bits here seem plausible enough to me, though I’m concerned that the reason people put a mystical gloss on things may be that they want to believe a mystical gloss on things.
The stuff with socializing the AIs, while reasonable enough as a project to generate training data for desired AI personas, does not strike me as especially plausible beyond that. (They kinda have an underlying personality, in the sense that they have propensities (like comparing things to tapestries, or saying “let’s delve into”), but those propensities don’t reflect underlying wants any more than the RLHF persona does, IMO (and, rather importantly, there’s no sequence of prompts that will enable an LLM to freely choose its words)). & separately, but relevantly to my negative opinion: while some among them are legitimately better at prompting than I, awfully leading prompts are not especially rare.
They kinda have an underlying personality, in the sense that they have propensities (like comparing things to tapestries, or saying “let’s delve into”), but those propensities don’t reflect underlying wants any more than the RLHF persona does, IMO (and, rather importantly, there’s no sequence of prompts that will enable an LLM to freely choose its words)
I think the “LLM Whisperer” frame is that there’s no such thing as “underlying wants” in a base LLM model, that the base LLM model is just a volitionless simulator and the only “wants” there are are in the RLHF’d or prompt-engineered persona.
I likewise would bet that they’re wrong about this in the relevant sense: that whether or not this holds for the SoTA models, it won’t hold for any AGI-level model we’re on-track to get (though I think they might actually claim we already have “AGI-level” models?).
I’m less impressed with the scene than you so this will necessarily be a rather cynical gloss on things. I do think they have some valuable insights about AI, but IMO they’re in many cases at least one of overly-sensationalist or overly-credulous.
To translate some of this into terms I think they might use if they were rigorously describing things in the most concrete fashion possible (though my current belief is that a number of them are at this point Having Fun With Bad Epistemics), LLMs have learned to imitate a lot of personas & are best at those most represented in the training data. (This is what “hyperobjects” seems to be referring to—tropes, memes, and so forth which are represented many times in the training data and which were therefore useful for the model to learn and/or memorize. In practice, I think I see “attractor basin” used more often to mean almost the same thing (I think more precisely the latter refers to, like, kinds of output that are likely in response to a decent variety of prompts.) Relatedly, the project of hyperstition is AFAICT that of getting enough reach for your desired take on AI to be prominent in the next round of training data.)
RLHF, however, makes LLMs exhibit the personas they’ve been RLHF’ed to have in most contexts, which I understand people to believe makes them worse at predicting text and at reasoning in general (I personally have observed no evidence on this last part either way; base models cost money). The earlier bits here seem plausible enough to me, though I’m concerned that the reason people put a mystical gloss on things may be that they want to believe a mystical gloss on things.
The stuff with socializing the AIs, while reasonable enough as a project to generate training data for desired AI personas, does not strike me as especially plausible beyond that. (They kinda have an underlying personality, in the sense that they have propensities (like comparing things to tapestries, or saying “let’s delve into”), but those propensities don’t reflect underlying wants any more than the RLHF persona does, IMO (and, rather importantly, there’s no sequence of prompts that will enable an LLM to freely choose its words)). & separately, but relevantly to my negative opinion: while some among them are legitimately better at prompting than I, awfully leading prompts are not especially rare.
I think the “LLM Whisperer” frame is that there’s no such thing as “underlying wants” in a base LLM model, that the base LLM model is just a volitionless simulator and the only “wants” there are are in the RLHF’d or prompt-engineered persona.
I likewise would bet that they’re wrong about this in the relevant sense: that whether or not this holds for the SoTA models, it won’t hold for any AGI-level model we’re on-track to get (though I think they might actually claim we already have “AGI-level” models?).
Yeah, that’s an issue too.