porby comments on How to Control an LLM’s Behavior (why my P(DOOM) went down)

porby 29 Nov 2023 19:18 UTC
5 points
1
What I’m calling a simulator (following Janus’s terminology) you call a predictor
Yup; I use the terms almost interchangeably. I tend to use “simulator” when referring to predictors used for a simulator-y use case, and “predictor” when I’m referring to how they’re trained and things directly related to that.
I also like your metatoken concept: that’s functionally what I’m suggesting for the tags in my proposal, except I follow the suggestion of this paper to embed them via pretraining.
Yup again—to be clear, all the metatoken stuff I was talking about would also fit in pretraining. Pretty much exactly the same thing. There are versions of it that might get some efficiency boosts by not requiring them to be present for the full duration of pretraining, but still similar in concept. (If we can show an equivalence between trained conditioning and representational interventions, and build representational interventions out of conditions, that could be many orders of magnitude faster.)