Thanks for this post; I don’t know much about Ought other than what you’ve just said, so sorry if this has already been answered elsewhere:
You say that ” Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. ”
It also seems like a crucial step in pretty much all institution design. Surely there is a large literature on this already? Surely there have been scientific experiments run on this already? What does the state of modern science on this question look like right now, and does Ought have plans to collaborate with academics in some manner? A quick skim of the Ought website didn’t turn up any references to existing literature.
There is a large economics literature on principal agent problems, optimal contracting, etc.; these usually consider the situation where we can discover the ground truth or see the outcome of a decision (potentially only partially, or at some cost) and the question is how to best structure incentives in light of that. This typically holds for a profit-maximizing firm, at least to some extent, since they ultimately want to make money. I’m not aware of work in economics that addresses the situation where there is no external ground truth, except to prove negative results which justify the use of other assumptions. I don’t believe there’s much that would be useful to Ought, probably because it’s a huge mess and hard to fit into the field’s usual frameworks.
(I actually think even the core economics questions relevant to Ought, where you do have a ground truth and expensive monitoring, a pool of risk-averse expert some of whom are malicious, etc.; aren’t fully answered in the economics literature, and that these versions of the questions aren’t a major focus in economics despite being theoretically appealing from a certain perspective. But (i) I’m much less sure of that, and someone would need to have some discussion with relevant experts to find out, (ii) in that setting I do think economists have things to say even if they haven’t answered all of the relevant questions.)
In practice, I think institutions are basically always predicated on one of (i) having some trusted experts, or a principal with understanding of the area, (ii) having someone trusted who can at least understand the expert’s reasoning when adequately explained, (iii) being able to monitor outcomes to see what ultimately works well. I don’t really know of institutions that do well when none of (i)-(iii) apply. Those work OK in practice today but seem to break down quickly as you move to the setting with powerful AI (though even today I don’t think they work great and would hope that a better understanding could help, I just wouldn’t necessarily expect it to help as much as work that engages directly with existing institutions and their concrete failures).
Thanks for this post; I don’t know much about Ought other than what you’ve just said, so sorry if this has already been answered elsewhere:
You say that ” Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. ”
It also seems like a crucial step in pretty much all institution design. Surely there is a large literature on this already? Surely there have been scientific experiments run on this already? What does the state of modern science on this question look like right now, and does Ought have plans to collaborate with academics in some manner? A quick skim of the Ought website didn’t turn up any references to existing literature.
There is a large economics literature on principal agent problems, optimal contracting, etc.; these usually consider the situation where we can discover the ground truth or see the outcome of a decision (potentially only partially, or at some cost) and the question is how to best structure incentives in light of that. This typically holds for a profit-maximizing firm, at least to some extent, since they ultimately want to make money. I’m not aware of work in economics that addresses the situation where there is no external ground truth, except to prove negative results which justify the use of other assumptions. I don’t believe there’s much that would be useful to Ought, probably because it’s a huge mess and hard to fit into the field’s usual frameworks.
(I actually think even the core economics questions relevant to Ought, where you do have a ground truth and expensive monitoring, a pool of risk-averse expert some of whom are malicious, etc.; aren’t fully answered in the economics literature, and that these versions of the questions aren’t a major focus in economics despite being theoretically appealing from a certain perspective. But (i) I’m much less sure of that, and someone would need to have some discussion with relevant experts to find out, (ii) in that setting I do think economists have things to say even if they haven’t answered all of the relevant questions.)
In practice, I think institutions are basically always predicated on one of (i) having some trusted experts, or a principal with understanding of the area, (ii) having someone trusted who can at least understand the expert’s reasoning when adequately explained, (iii) being able to monitor outcomes to see what ultimately works well. I don’t really know of institutions that do well when none of (i)-(iii) apply. Those work OK in practice today but seem to break down quickly as you move to the setting with powerful AI (though even today I don’t think they work great and would hope that a better understanding could help, I just wouldn’t necessarily expect it to help as much as work that engages directly with existing institutions and their concrete failures).