The I part I’ll agree with is: If we look at a dial, we can ask the question:
If there’s an AGI with a safety-capabilities tradeoff dial, to what extent is the dial’s setting externally legible / auditable to third parties?
More legible / auditable is better, because it could help enforcement.
I agree with this, and I have just added it to the article. But I disagree with your suggestion that this is counter to what I wrote. In my mind, it’s an orthogonal dimension along which dials can vary. I think it’s good if the dial is auditable, and I think it’s also good if the dial corresponds to a very low alignment tax rate.
I interpret your comment as saying that the alignment tax rate doesn’t matter because there will be enforcement, but I disagree with that. I would invoke an analogy to actual taxes. It is already required and enforced that individuals and companies pay (normal) taxes. But everyone knows that a 0.1% tax on Thing X will have a higher compliance rate than an 80% tax on Thing X, other things equal.
After all, everyone is making decisions about whether to pay the tax, versus not pay the tax. Not paying the tax has costs. It’s a cost to hire lawyers that can do complicated accounting tricks. It’s a cost to run the risk of getting fined or imprisoned. It’s a cost to pack up your stuff and move to an anarchic war zone, or to a barge in the middle of the ocean, etc. It’s a cost to get pilloried in the media for tax evasion. People will ask themselves: are these costs worth the benefits? If the tax is 0.1%, maybe it’s not worth it, maybe it’s just way better to avoid all that trouble by paying the tax. If the tax is 80%, maybe it is worth it to engage in tax evasion.
So anyway, I agree that “there will be good enforcement” is plausibly part of the answer. But good enforcement plus low tax will sum up to higher compliance than good enforcement by itself. Unless you think “perfect watertight enforcement” is easy, so that “willingness to comply” becomes completely irrelevant. That strikes me as overly optimistic. Perfect watertight enforcement of anything is practically nonexistent in this world. Perfect watertight enforcement of experimental AGI research would strike me as especially hard. After all, AGI research is feasible to do in a hidden basement / anarchic war zone / barge in the middle of the ocean / secret military base / etc. And there are already several billion GPUs untraceably dispersed all across the surface of Earth.
Based on what you say above, I do not think we fundamentally disagree. There are orthogonal dimensions to safety mechanism design which are all important.
I somewhat singled out your line of ‘the lower the better’ because I felt that your taxation framing was too one-dimensional.
There is another matter: in US/UK political discourse, it common that if someone wants to prevent the government from doing something useful, this something will be framed as a tax, or as interfering with economic efficiency. If someone does want the government to actually do a thing, in fact spend lavishly on doing it, the same thing will often be framed as enforcement. This observation says something about the quality of the political discourse. But as a continental European, it is not the quality of the discourse I want to examine here, only the rhetorical implications.
When you frame your safety dials as taxation, then rhetorically you are somewhat shooting yourself in the foot, if you want proceed by arguing that these dials should not be thrown out of the discussion.
When re-framed as enforcement, the cost of using these safety dials suddenly does not sound as problematic anymore.
But enforcement, in a way that limits freedom of action, is indeed a burden to those at the receiving end, and if enforcement is too heavy they might seek to escape it altogether. I agree that perfectly inescapable watertight enforcement is practically nonexistent in this world, in fact I consider its non-existence to be more of a desirable feature of society than it is a bug.
But to use your terminology, the level of enforcement applied to something is just one of these tradeoff dials that stink. That does not mean we should throw out the dial.
The I part I’ll agree with is: If we look at a dial, we can ask the question:
If there’s an AGI with a safety-capabilities tradeoff dial, to what extent is the dial’s setting externally legible / auditable to third parties?
More legible / auditable is better, because it could help enforcement.
I agree with this, and I have just added it to the article. But I disagree with your suggestion that this is counter to what I wrote. In my mind, it’s an orthogonal dimension along which dials can vary. I think it’s good if the dial is auditable, and I think it’s also good if the dial corresponds to a very low alignment tax rate.
I interpret your comment as saying that the alignment tax rate doesn’t matter because there will be enforcement, but I disagree with that. I would invoke an analogy to actual taxes. It is already required and enforced that individuals and companies pay (normal) taxes. But everyone knows that a 0.1% tax on Thing X will have a higher compliance rate than an 80% tax on Thing X, other things equal.
After all, everyone is making decisions about whether to pay the tax, versus not pay the tax. Not paying the tax has costs. It’s a cost to hire lawyers that can do complicated accounting tricks. It’s a cost to run the risk of getting fined or imprisoned. It’s a cost to pack up your stuff and move to an anarchic war zone, or to a barge in the middle of the ocean, etc. It’s a cost to get pilloried in the media for tax evasion. People will ask themselves: are these costs worth the benefits? If the tax is 0.1%, maybe it’s not worth it, maybe it’s just way better to avoid all that trouble by paying the tax. If the tax is 80%, maybe it is worth it to engage in tax evasion.
So anyway, I agree that “there will be good enforcement” is plausibly part of the answer. But good enforcement plus low tax will sum up to higher compliance than good enforcement by itself. Unless you think “perfect watertight enforcement” is easy, so that “willingness to comply” becomes completely irrelevant. That strikes me as overly optimistic. Perfect watertight enforcement of anything is practically nonexistent in this world. Perfect watertight enforcement of experimental AGI research would strike me as especially hard. After all, AGI research is feasible to do in a hidden basement / anarchic war zone / barge in the middle of the ocean / secret military base / etc. And there are already several billion GPUs untraceably dispersed all across the surface of Earth.
Based on what you say above, I do not think we fundamentally disagree. There are orthogonal dimensions to safety mechanism design which are all important.
I somewhat singled out your line of ‘the lower the better’ because I felt that your taxation framing was too one-dimensional.
There is another matter: in US/UK political discourse, it common that if someone wants to prevent the government from doing something useful, this something will be framed as a tax, or as interfering with economic efficiency. If someone does want the government to actually do a thing, in fact spend lavishly on doing it, the same thing will often be framed as enforcement. This observation says something about the quality of the political discourse. But as a continental European, it is not the quality of the discourse I want to examine here, only the rhetorical implications.
When you frame your safety dials as taxation, then rhetorically you are somewhat shooting yourself in the foot, if you want proceed by arguing that these dials should not be thrown out of the discussion.
When re-framed as enforcement, the cost of using these safety dials suddenly does not sound as problematic anymore.
But enforcement, in a way that limits freedom of action, is indeed a burden to those at the receiving end, and if enforcement is too heavy they might seek to escape it altogether. I agree that perfectly inescapable watertight enforcement is practically nonexistent in this world, in fact I consider its non-existence to be more of a desirable feature of society than it is a bug.
But to use your terminology, the level of enforcement applied to something is just one of these tradeoff dials that stink. That does not mean we should throw out the dial.