So, I’m an alignment researcher. And I have a lot of prediction-flavored questions which steer my day-to-day efforts. Which recent interpretability papers will turn out, in hindsight a year or two from now, to have been basically correct and high-value? Will this infrabayes stuff turn out to be useful for real-world systems, or is it the kind of abstract math which lost contact with reality and will never reconnect? Is there any mathematical substance to Predictive Processing other than “variational inference is a thing” plus vigorous hand-waving? How about Shard Theory? What kinds of evidence will we see for/against the Natural Abstraction Hypothesis over the next few years? Will HCH-style amplification ever be able to usefully factor problems which the human operator/programmer doesn’t immediately see how to factor? Will some version of this conjecture be proven? Will it be proven by someone else if I don’t focus on it? Are there any recent papers/posts which lots of other people expect to have high value in hindsight, but which I haven’t paid attention to?
Or, to sum that all up in two abstract questions: what will I (or someone else whose judgement I at least find informative) think I should have paid more attention to, in hindsight? What will it turn out, in hindsight, that I should have ignored or moved on from sooner?
Hmm, I wonder if the prediction markets have anything useful to say here? Let’s go look for AI predictions on Manifold…
So, we’ve got about a gazillion different flavors of AI capabilities questions, with a little bit of dabbling into how-society-will-react-capabilities questions. On the upside, operationalizing things in lots of different ways is exactly what we want. On the downside, approximately-all of the effort is in operationalizing one thing (AI capabilities timelines), and that thing is just not particularly central to day-to-day research decisions. It’s certainly not on the list of questions which first jump to mind when I think of things which would be useful to know to steer my research. Sure, somebody is going to argue in the comments that timelines are relevant for some particular decision, like whether to buy up GPU companies or something, but there’s no way in hell that focusing virtually all alignment-related prediction-market effort on operationalizations of capabilities timelines is actually the value-maximizing strategy here.
(And while I happened to open up Manifold first, the situation is basically the same in other prediction markets. To a first approximation, the only alignment-adjacent question prediction markets ever weigh in on is timelines.)
So, here’s my main advice to someone who wants to use prediction markets to help alignment work: imagine that you are an alignment researcher/grantmaker/etc. Concretely imagine your day-to-day: probing weight matrices in nets, conjecturing, reading papers/posts, reviewing proposals, etc. Then, ask what kind of predictions have the highest information value for your work. If the answer is “yet another operationalization of timelines”, then you have probably fucked up somewhere.
Of course you might also try asking some researchers or grantmakers the same question, though keep in mind the standard user-interview caveat: users do not actually know what they want or would like.
How To Make Prediction Markets Useful For Alignment Work
So, I’m an alignment researcher. And I have a lot of prediction-flavored questions which steer my day-to-day efforts. Which recent interpretability papers will turn out, in hindsight a year or two from now, to have been basically correct and high-value? Will this infrabayes stuff turn out to be useful for real-world systems, or is it the kind of abstract math which lost contact with reality and will never reconnect? Is there any mathematical substance to Predictive Processing other than “variational inference is a thing” plus vigorous hand-waving? How about Shard Theory? What kinds of evidence will we see for/against the Natural Abstraction Hypothesis over the next few years? Will HCH-style amplification ever be able to usefully factor problems which the human operator/programmer doesn’t immediately see how to factor? Will some version of this conjecture be proven? Will it be proven by someone else if I don’t focus on it? Are there any recent papers/posts which lots of other people expect to have high value in hindsight, but which I haven’t paid attention to?
Or, to sum that all up in two abstract questions: what will I (or someone else whose judgement I at least find informative) think I should have paid more attention to, in hindsight? What will it turn out, in hindsight, that I should have ignored or moved on from sooner?
Hmm, I wonder if the prediction markets have anything useful to say here? Let’s go look for AI predictions on Manifold…
hollywood-level AI-generated feature film by 2026?
Will an AI get gold on any International Math Olympiad by 2025?
Will AI wipe out humanity before the year 2100
Will any Fortune 500 corporation mostly/entirely replace their customer service workforce with AI by 2026?
…
So, we’ve got about a gazillion different flavors of AI capabilities questions, with a little bit of dabbling into how-society-will-react-capabilities questions. On the upside, operationalizing things in lots of different ways is exactly what we want. On the downside, approximately-all of the effort is in operationalizing one thing (AI capabilities timelines), and that thing is just not particularly central to day-to-day research decisions. It’s certainly not on the list of questions which first jump to mind when I think of things which would be useful to know to steer my research. Sure, somebody is going to argue in the comments that timelines are relevant for some particular decision, like whether to buy up GPU companies or something, but there’s no way in hell that focusing virtually all alignment-related prediction-market effort on operationalizations of capabilities timelines is actually the value-maximizing strategy here.
(And while I happened to open up Manifold first, the situation is basically the same in other prediction markets. To a first approximation, the only alignment-adjacent question prediction markets ever weigh in on is timelines.)
So, here’s my main advice to someone who wants to use prediction markets to help alignment work: imagine that you are an alignment researcher/grantmaker/etc. Concretely imagine your day-to-day: probing weight matrices in nets, conjecturing, reading papers/posts, reviewing proposals, etc. Then, ask what kind of predictions have the highest information value for your work. If the answer is “yet another operationalization of timelines”, then you have probably fucked up somewhere.
Of course you might also try asking some researchers or grantmakers the same question, though keep in mind the standard user-interview caveat: users do not actually know what they want or would like.