Re Regulatory markets for AI safety: You say that the proposal doesn’t seem likely to work if “alignment is really hard and we only get one shot at it” (i.e. unbounded maximiser with discontinuous takeoff). Do you expect that status-quo government regulation would do any better, or just that any regulation wouldn’t be helpful in such a scenario? My intuition is that even if alignment is really hard, regulation could be helpful e.g. by reducing races to the bottom, and I’d rather have a more informed group (like people from a policy and technical safety team at a top lab) implementing it instead of a less-informed government agency. I’m also not sure what you mean by legible regulation.
apc
Sharing Powerful AI Models
Survey on AI existential risk scenarios
Some AI Governance Research Ideas
Great post.
Re Regulatory Markets for AI Safety: I’d be interested in hearing more about why you think that they might not be useful in an AGI scenario, because “the goals and ex post measurement for private regulators are likely to become outdated and irrelevant”.
Why do you think the goals and ex post measurement are likely to become irrelevant? Furthermore, isn’t this also an argument against any kind of regulatory action on AI? Because with status-quo regulation, the goals and ex post measurement of regulatory outcomes are of course also done by the government agency. (though I’m not sure that I correctly understand what you mean by “ex-post measurement for the private regulators”)
Is this a fair description of your disagreement re the 90% argument?
Daniel thinks that a 90% reduction in the population of a civilization corresponds to a ~90% reduction in their power/influentialness. Because the Americans so greatly outnumbered the Spanish, this ten-fold reduction in power/influentialness doesn’t much alter the conclusion.
Matthew thinks that a 90% reduction in the population of a civilization means that “you don’t really have a civilization”, which I interpret to mean something like a ~99.9%+ reduction in the power/influentialness of a civilization, which occurs mainly through a reduction in their ability to coordinate (e.g. “chain of command in ruins”). This is significant enough to undermine the main conclusion.
If this is accurate, would a historical survey of the power/influentialness of civilisations after they lose 90% of the population (inasmuch as these cases exist) resolve the disagreement?
Thanks for clarifying. That’s interesting and seems right if you think we won’t draft legal contracts with AI. Could you elaborate on why you think that?
I think it’s worth distinguishing between a legal contract and setting the AI’s motivational system, even though the latter is a contract in some sense. My reading of Stuart’s post was that it was intended literally, not as a metaphor. Regardless, both are relevant; in PAL, you’d model motivational system via the agents utility function, and the contract enforceability via the background assumption.
But I agree that contract enforceability isn’t a knock-down, and indeed won’t be an issue by default. I think we should have framed this more clearly in the post. Here’s the most important part of what we said:
But it is plausible for when AIs are similarly smart to humans, and in scenarios where powerful AIs are used to enforce contracts. Furthermore, if we cannot enforce contracts with AIs then people will promptly realise and stop using AIs; so we should expect contracts to be enforceable conditional upon AIs being used.
I agree that this seems like a promising research direction! I think this would be done best while also thinking about concrete traits of AI systems, as discussed in this footnote. One potential beneficial outcome would be to understand which kind of systems earn rents and which don’t; I wouldn’t be surprised if the distinction between rent earning agents vs others mapped pretty cleanly onto a Bostromian utility maximiser vs CAIS distinction, but maybe it won’t.
In any case, the alternative perspective offered by the agency rents framing compared to typical AI alignment discussion could help generate interesting new insights.
Thanks! Yeah, we probably should have included a definition. The wikipedia page is good.
Thank you! :)
I wouldn’t characterise the conclusion as “nope, doesn’t pan out”. Maybe more like: we can’t infer too much from existing PAL, but AI agency rents are an important consideration, and for a wide range of future scenarios new agency models could tell us about the degree of rent extraction.
The claim that this couldn’t work because such models are limited seems just arbitrary and wrong to me.
The economists I spoke to seemed to think that in agency unawareness models conclusions follow pretty immediately from the assumptions and so don’t teach you much. It’s not that they can’t model real agency problems, just that you don’t learn much from the model. Perhaps if we’d spoken to more economists there would have been more disagreement on this point.
Aside from the arguments we made about modelling unawareness, I don’t think we were claiming that econ theory wouldn’t be useful. We argue that new agency models could tell us about the levels of rents extracted by AI agents; just that i) we can’t infer much from existing models because they model different situations and are brittle, ii) that models won’t shed light on phenomena beyond what they are trying to model
The intuition is that if the principal could perfectly monitor whether the agent was working or shirking, they can just specify a cause in the contract the punishes them whenever they shirk. Equivalently, if the principal knows the agent’s cost of production (or ability level), they can extract all the surplus value without leaving any rent.
Pages 40-53 of The Theory of Incentives contrasts these “first best” and “second-best” solutions (it’s easy to find online).
Thanks for catching this! You’re correct that that sentence is inaccurate. Our views changed while iterating the piece and that sentence should have been changed to: “PAL confirms that due to diverging interests and imperfect monitoring, AI agents could get some rents.”
This sentence too: “Overall, PAL tells us that agents will inevitably extract some agency rents…” would be better as “Overall, PAL is consistent with AI agents extracting some agency rents…”
I’ll make these edits, with a footnote pointing to your comment.
The main aim of that section was to point out that Paul’s scenario isn’t in conflict with PAL. Without further research, I wouldn’t want to make strong claims about what PAL implies for AI agency rents because the models are so brittle and AIs will likely be very different to humans; it’s an open question.
For there to be no agency rents at all, I think you’d need something close to perfect competition between agents. In practice the necessary conditions are basically never satisfied because they are very strong, so it seems very plausible to me that AI agents extract rents.
Re monopoly rents vs agency rents: Monopoly rents refer to the opposite extreme with very little competition, and in the economics literature is used when talking about firms, while agency rents are present whenever competition and monitoring are imperfect. Also, agency rents refer specifically to the costs inherent to delegating to an agent (e.g. an agent making investment decisions optimising for commission over firm profit) vs the rents from monopoly power (e.g. being the only firm able to use a technology due to a patent). But as you say, it’s true that lack of competition is a cause of both of these.
What can the principal-agent literature tell us about AI risk?
I’ve also found it hard to find relevant papers.
Behavioural Contract Theory reviews papers based on psychology findings and notes:
In almost all applications, researchers assume that the agent (she) behaves according to one psychologically based model, while the principal (he) is fully rational and has a classical goal (usually profit maximization).
Optimal Delegation and Limited Awareness is relevant insofar as you consider an agent knowing more facts about the world is akin to them being more capable. Papers which consider contracting scenarios with bounded rationality, though not exactly principal-agent problems include Cognition and Incomplete Contracts and Satisfying Contracts. There are also some papers where the principal and agent have heterogenous priors, but the agent typically has the false prior. I’ve talked to a few economists about this, and they weren’t able to suggest anything I hadn’t seen (I don’t think my literature review is totally thorough, though).
Thanks for writing this! It was useful when organising my workout routine.
I read the Kovacevic et al paper on sleep you cite, and there are some caveats probably relevant to some LW readers. In particular, the benefits are less clear for younger adults.
Acute resistance exercise studies
“There was some evidence that an acute bout of resistance exercise may reduce the number of arousals during sleep”
They base this on three studies. The cohorts are elderly (65-80 years), middle-aged (mean 44.4 +- 8 years), and young (21.9 +- 2.7 years). They note that “in the final study in healthy young-to-middle aged adults, no effects were observed on sleep quality measured using accelerometry, or on 5- and 7-point Likert scales following an acute bout of resistance exercise”
Chronic resistance exercise studies
“Overall, the data suggest that chronic resistance exercise has significant benefit on subjective sleep quality”
They base this on seven studies. “All four studies performed in older adults reported significant improvements in sleep quality...however, results were inconsistent for younger adults, with only one out of three studies reporting significant improvements.” (the study with the significant improvements was of young women).
The above results focus on sleep quality, because as you say, sleep quantity tends not to be much influenced resistance training. Nevertheless, note the following:
“The remaining study in younger adults (many with insomnia) reported a large but non- significant negative effect on sleep duration following moderate-intensity resistance training on both weekdays and weekends.”
I was curious to see if this apparent age effect exists for aerobic exercise. The Kovacevic et al cite the following Kredlow et al paper as a “recent review of aerobic exercise” https://www.ncbi.nlm.nih.gov/pubmed/25596964. Indeed it seems to be predominantly about aerobic exercise, but also covers some anaerobic exercise studies. According to Kovacevic et al has 10 fewer papers on resistance training than their paper, which by my count makes it 11 resistance training papers. Kredlow et al write:
“For the majority of outcomes, there is no difference for the benefits of exercise on sleep depending on age or sex. We did, however, find significant differences for certain sleep variables. Specifically, the benefits of acute exercise did not vary by age...the benefits of regular exercise did not vary by sex and for one variable (sleep onset latency), appeared to be stronger for younger than older individuals.”
Overall, it’s quite unclear to me whether resistance exercise has sleep benefits for younger adults, and the evidence for aerobic exercise seems stronger (although I’d like to find a review solely of aerobic exercise studies).
Glad to see this published—nice work!