This feels like painting with too broad a brush, and from my state of knowledge, the assumed frame eliminates at least one viable solution. For example, can one build an AI without harmful instrumental incentives (without requiring any fragile specification of “harmful”)? If you think not, how do you know that? Do we even presently have a gears-level understanding of why instrumental incentives occur?
Coincidentally, just yesterday I was part of some conversations that now make me more bullish on this approach. I haven’t thought about it much in quite a while, and now I’m returning to it.
To say e.g. HCH is so likely to fail we should feel pessimistic about it, it doesn’t seem to be enough to say “Goodhart’s curse applies”. Goodhart’s curse applies when I’m buying apples at the grocery store. Why should we expect this bias of HCH to be enough to cause catastrophes, like it would for a superintelligent EU maximizer operating on an unbiased (but noisy) estimate of what we want? Some designs leave more room for correction and cushion, and it seems prudent to consider to what extent that is true for a proposed design.
It depends on how much risk you are willing to tolerate, I think. HCH applies optimization pressure, and in the limit of superintelligence I expect it to be so much optimization pressure that any deviance will become so large as to become a problem. But a person could choose to accept the risk with strategies that help minimize risk of deviance such that they think those strategies will do enough to mitigate the worst of that effect in the limit.
As far as leaving room for correction and cushion, those also require a relatively slow takeoff because it requires time for humans to think and intervene. Since I expect takeoff to be fast, I don’t expect there to be adequate time for humans in the loop to notice and correct deviance, thus any deviance that can appear late in the process is a problem in my view.
This isn’t obvious to me. Mild optimization seems like a natural thing people are able to imagine doing. If I think about “kinda helping you write a post but not going all-out”, the result is not at all random actions. Can you expand?
The problem with mild optimization is that it doesn’t eliminate the bias that causes the optimizer’s curse, only attenuates it. So unless we can cause via a “mild” method there to be a finite bound on the amount of deviance in the limit of optimization pressure, I don’t expect it to help.
Coincidentally, just yesterday I was part of some conversations that now make me more bullish on this approach. I haven’t thought about it much in quite a while, and now I’m returning to it.
The potential solution I was referring to is motivated in the recently-completed Reframing Impact sequence.
Coincidentally, just yesterday I was part of some conversations that now make me more bullish on this approach. I haven’t thought about it much in quite a while, and now I’m returning to it.
It depends on how much risk you are willing to tolerate, I think. HCH applies optimization pressure, and in the limit of superintelligence I expect it to be so much optimization pressure that any deviance will become so large as to become a problem. But a person could choose to accept the risk with strategies that help minimize risk of deviance such that they think those strategies will do enough to mitigate the worst of that effect in the limit.
As far as leaving room for correction and cushion, those also require a relatively slow takeoff because it requires time for humans to think and intervene. Since I expect takeoff to be fast, I don’t expect there to be adequate time for humans in the loop to notice and correct deviance, thus any deviance that can appear late in the process is a problem in my view.
The problem with mild optimization is that it doesn’t eliminate the bias that causes the optimizer’s curse, only attenuates it. So unless we can cause via a “mild” method there to be a finite bound on the amount of deviance in the limit of optimization pressure, I don’t expect it to help.
The potential solution I was referring to is motivated in the recently-completed Reframing Impact sequence.