If I’ve understood you correctly, you consider your only major delta with Elizer Yudkowsky to be whether or not natural abstractions basically always work or reliably exist harnessably, to put it in different terms. Is that a fair restatement?
If so, I’m (specifically) a little surprised that that’s all. I would have expected whatever reasoning the two of you did differently or whatever evidence the two of you weighted differently (or whatever else) would have also given you some other (likely harder to pin down) generative-disagreements (else maybe it’s just really narrow really strong evidence that one of you saw and the other didn’t???).
Maybe that’s just second-order though. But I would still like to hear what the delta between NADoom!John and EY still is, if there is one. If there isn’t, that’s surprising, too, and I’d be at least a little tempted to see what pairs of well-regarded alignment researchers still seem to agree on (and then if there are nonobvious commonalities there).
Also, to step back from the delta a bit here -
Why are you as confident as you are—more confident than the median alignment researcher, I think—about natural abstractions existing to a truly harnessable extent?
What makes you be ~85% sure that even really bizarrely[1] trained AIs will have internal ontologies that humanish ontologies robustly and faithfully map into? Are there any experiments, observations, maxims, facts, or papers you can point to?
What non-obvious things could you see that would push that 85ish% up or down; what reasonably-plausible (>1-2%, say) near-future occurrences would kill off the largest blocks of your assigned probability mass there?
For all we know, all our existing training methods are really good at producing AIs with alien ontologies, and there’s some really weird unexpected procedure you need to follow that does produce nice ontology-sharing aligned-by-default AIs. I wouldn’t call it likely, but if we feel up to positing that possibility at all, we should also be willing to posit the reverse.
If I’ve understood you correctly, you consider your only major delta with Elizer Yudkowsky to be whether or not natural abstractions basically always work or reliably exist harnessably, to put it in different terms. Is that a fair restatement?
If so, I’m (specifically) a little surprised that that’s all. I would have expected whatever reasoning the two of you did differently or whatever evidence the two of you weighted differently (or whatever else) would have also given you some other (likely harder to pin down) generative-disagreements (else maybe it’s just really narrow really strong evidence that one of you saw and the other didn’t???).
Maybe that’s just second-order though. But I would still like to hear what the delta between NADoom!John and EY still is, if there is one. If there isn’t, that’s surprising, too, and I’d be at least a little tempted to see what pairs of well-regarded alignment researchers still seem to agree on (and then if there are nonobvious commonalities there).
Also, to step back from the delta a bit here -
Why are you as confident as you are—more confident than the median alignment researcher, I think—about natural abstractions existing to a truly harnessable extent?
What makes you be ~85% sure that even really bizarrely[1] trained AIs will have internal ontologies that humanish ontologies robustly and faithfully map into? Are there any experiments, observations, maxims, facts, or papers you can point to?
What non-obvious things could you see that would push that 85ish% up or down; what reasonably-plausible (>1-2%, say) near-future occurrences would kill off the largest blocks of your assigned probability mass there?
For all we know, all our existing training methods are really good at producing AIs with alien ontologies, and there’s some really weird unexpected procedure you need to follow that does produce nice ontology-sharing aligned-by-default AIs. I wouldn’t call it likely, but if we feel up to positing that possibility at all, we should also be willing to posit the reverse.