The central crux really isn’t where values are generated. That’s a more or less trivial aside. (though my claim was simply that it’s implausible the values aimed for would be entirely determined by genome + learning process; that’s a very weak claim; 98% determined is [not entirely determined])
The crux is the tautology issue: I’m saying there’s nothing to explain, since the source of information we have on [what values are being “aimed for”] is human behaviour, and the source of information we have on what values are achieved, is human behaviour.
These things must agree with one-another: the learning process that produced human values produces human values. From an alignment difficulty perspective, that’s enough to conclude that there’s nothing to learn here.
An argument of the form [f(x) == f(x), therefore y] is invalid. f(x) might be interesting for other reasons, but that does nothing to rescue the argument.
The crux is the tautology issue: I’m saying there’s nothing to explain, since the source of information we have on [what values are being “aimed for”] is human behaviour, and the source of information we have on what values are achieved, is human behaviour.
That’s our disagreement, in that we have more information than that. I agree human behavior plays a role in my evidence base, but there’s more evidence I have than that.
In particular I am using results from both ML/AI and human brain studies to inform my conclusion.
Basically, my claim is that [f(x) == f(y), therefore z].
The central crux really isn’t where values are generated. That’s a more or less trivial aside. (though my claim was simply that it’s implausible the values aimed for would be entirely determined by genome + learning process; that’s a very weak claim; 98% determined is [not entirely determined])
The crux is the tautology issue: I’m saying there’s nothing to explain, since the source of information we have on [what values are being “aimed for”] is human behaviour, and the source of information we have on what values are achieved, is human behaviour.
These things must agree with one-another: the learning process that produced human values produces human values. From an alignment difficulty perspective, that’s enough to conclude that there’s nothing to learn here.
An argument of the form [f(x) == f(x), therefore y] is invalid.
f(x) might be interesting for other reasons, but that does nothing to rescue the argument.
That’s our disagreement, in that we have more information than that. I agree human behavior plays a role in my evidence base, but there’s more evidence I have than that.
In particular I am using results from both ML/AI and human brain studies to inform my conclusion.
Basically, my claim is that [f(x) == f(y), therefore z].