Me: *looks at some examples* “These operationalizations are totally ad-hoc. Whoever put together the fine-tuning dataset didn’t have any idea what a robust operationalization looks like, did they?”
… So maybe we should fund an effort to fine-tune some AI model on a carefully curated dataset of good operationalizations? Not convinced building it would require alignment research expertise specifically, just “good at understanding the philosophy of math” might suffice.
Finding the right operationalization is only partly intuition, partly it’s just knowing what sorts of math tools are available. That is, what exists in the concept-space and is already discovered. That part basically requires having a fairly legible high-level mental map of the entire space of mathematics, and building it is very effortful, takes many years, and has very little return on learning any specific piece of math.
At least, it’s definitely something I’m bottlenecked on, and IIRC even the Infra-Bayesianism people ended up deriving from scratch a bunch of math that latter turned out to be already known as part of imprecise probability theory. So it may be valuable to get some sort of “intelligent applied-math wiki” that babbles possible operationalizations at you/points you towards math-fields that may have the tools for modeling what you’re trying to model.
That said, I broadly agree that the whole “accelerate alignment research via AI tools” doesn’t seem very promising, either the Cyborgism or the Conditioning Generative Models directions. Not that I see any fundamental reason why pre-AGI AI tools can’t be somehow massively helpful for research — on the contrary, it feels like there ought to be some way to loop them it. But it sure seems trickier than it looks at first or second glance.
Just a small point: InfraBayesianism is (significantly) more general than imprecise probability.
But the larger point stands: a lot of the math of what alignment researchers need is already in the literature and an AI tool that can find those pieces of math and work with them profitably would be very useful.
… So maybe we should fund an effort to fine-tune some AI model on a carefully curated dataset of good operationalizations? Not convinced building it would require alignment research expertise specifically, just “good at understanding the philosophy of math” might suffice.
Finding the right operationalization is only partly intuition, partly it’s just knowing what sorts of math tools are available. That is, what exists in the concept-space and is already discovered. That part basically requires having a fairly legible high-level mental map of the entire space of mathematics, and building it is very effortful, takes many years, and has very little return on learning any specific piece of math.
At least, it’s definitely something I’m bottlenecked on, and IIRC even the Infra-Bayesianism people ended up deriving from scratch a bunch of math that latter turned out to be already known as part of imprecise probability theory. So it may be valuable to get some sort of “intelligent applied-math wiki” that babbles possible operationalizations at you/points you towards math-fields that may have the tools for modeling what you’re trying to model.
That said, I broadly agree that the whole “accelerate alignment research via AI tools” doesn’t seem very promising, either the Cyborgism or the Conditioning Generative Models directions. Not that I see any fundamental reason why pre-AGI AI tools can’t be somehow massively helpful for research — on the contrary, it feels like there ought to be some way to loop them it. But it sure seems trickier than it looks at first or second glance.
Just a small point: InfraBayesianism is (significantly) more general than imprecise probability. But the larger point stands: a lot of the math of what alignment researchers need is already in the literature and an AI tool that can find those pieces of math and work with them profitably would be very useful.