I’m pretty sure these two papers work (or don’t work) regardless of mindful/non-mindful AI. They aren’t examples of mental-phenomena-assuming alignment techniques—they just use “ontology” and “values” as suggestive words for math, like “learning” in reinforcement learning. So it seems like there’s no evidence that mental-phenomena-assuming alignment techniques are possible.
Ah, okay. I think there is such evidence and it doesn’t hinge on whether or not these two papers constitute evidence of it, but here I don’t consider such arguments. This suggests I should perhaps devote more time to establishing the feasibility of such an approach, although I think we have no strong evidence yet that mindless techniques will work either so I mostly focused on giving evidence that mindless techniques are unlikely to work bringing them below the prior probability of mindful techniques working (being set to the probability that any class of techniques will work), whereas mindful techniques only have “evidence” against them in the form of arguments speculating about the nature of mental phenomena and arguing against its existence, which I point out is something we’re practically suspending resolution on here to make an argument given sufficient uncertainty about it that we can’t use it to resolve the issue.
Of course if you look at the probability of the whole quest succeeding, it seems small either way, and distinguishing between different small probabilities is hard. But if you look at individual steps, we’ve made small but solid steps toward understanding “mindless” AI alignment (like the concept of adversarial examples), but no comparable advances in understanding “mindful” AI. So to me the weight of evidence is against your position.
I’m pretty sure these two papers work (or don’t work) regardless of mindful/non-mindful AI. They aren’t examples of mental-phenomena-assuming alignment techniques—they just use “ontology” and “values” as suggestive words for math, like “learning” in reinforcement learning. So it seems like there’s no evidence that mental-phenomena-assuming alignment techniques are possible.
Ah, okay. I think there is such evidence and it doesn’t hinge on whether or not these two papers constitute evidence of it, but here I don’t consider such arguments. This suggests I should perhaps devote more time to establishing the feasibility of such an approach, although I think we have no strong evidence yet that mindless techniques will work either so I mostly focused on giving evidence that mindless techniques are unlikely to work bringing them below the prior probability of mindful techniques working (being set to the probability that any class of techniques will work), whereas mindful techniques only have “evidence” against them in the form of arguments speculating about the nature of mental phenomena and arguing against its existence, which I point out is something we’re practically suspending resolution on here to make an argument given sufficient uncertainty about it that we can’t use it to resolve the issue.
Of course if you look at the probability of the whole quest succeeding, it seems small either way, and distinguishing between different small probabilities is hard. But if you look at individual steps, we’ve made small but solid steps toward understanding “mindless” AI alignment (like the concept of adversarial examples), but no comparable advances in understanding “mindful” AI. So to me the weight of evidence is against your position.