All I’m saying is, these papers weren’t intended to be only about “mindful” AI. (You could ask Peter or Stuart, but I’m pretty sure.) And the rest of your post relies on there being techniques that only work on “mindful” AI, so it kinda falls apart.
Hmm, I’m having a hard time figuring out what to do with this feedback. Yes, I suppose such mental-phenomena-assuming alignment techniques are possible and point to two examples of things that look a bit like research in this direction even if you disagree that there is a way those things could work, but this seems not to suggest the rest “falls apart” in that I am reasoning about likelihoods; instead it suggests you disagree with the order of magnitude of likelihoods I assign and think my conclusion is not supportable because you think what I’m calling “philosophical techniques” are unlikely or unnecessary. That’s a somewhat different sort of critique than saying the argument falls apart because it hinged, say, on a proposition that is false.
Sorry if that seems nitpicky, but I’m just trying to make sure I understand the objection and respond to it appropriately.
I’m pretty sure these two papers work (or don’t work) regardless of mindful/non-mindful AI. They aren’t examples of mental-phenomena-assuming alignment techniques—they just use “ontology” and “values” as suggestive words for math, like “learning” in reinforcement learning. So it seems like there’s no evidence that mental-phenomena-assuming alignment techniques are possible.
Ah, okay. I think there is such evidence and it doesn’t hinge on whether or not these two papers constitute evidence of it, but here I don’t consider such arguments. This suggests I should perhaps devote more time to establishing the feasibility of such an approach, although I think we have no strong evidence yet that mindless techniques will work either so I mostly focused on giving evidence that mindless techniques are unlikely to work bringing them below the prior probability of mindful techniques working (being set to the probability that any class of techniques will work), whereas mindful techniques only have “evidence” against them in the form of arguments speculating about the nature of mental phenomena and arguing against its existence, which I point out is something we’re practically suspending resolution on here to make an argument given sufficient uncertainty about it that we can’t use it to resolve the issue.
Of course if you look at the probability of the whole quest succeeding, it seems small either way, and distinguishing between different small probabilities is hard. But if you look at individual steps, we’ve made small but solid steps toward understanding “mindless” AI alignment (like the concept of adversarial examples), but no comparable advances in understanding “mindful” AI. So to me the weight of evidence is against your position.
All I’m saying is, these papers weren’t intended to be only about “mindful” AI. (You could ask Peter or Stuart, but I’m pretty sure.) And the rest of your post relies on there being techniques that only work on “mindful” AI, so it kinda falls apart.
Hmm, I’m having a hard time figuring out what to do with this feedback. Yes, I suppose such mental-phenomena-assuming alignment techniques are possible and point to two examples of things that look a bit like research in this direction even if you disagree that there is a way those things could work, but this seems not to suggest the rest “falls apart” in that I am reasoning about likelihoods; instead it suggests you disagree with the order of magnitude of likelihoods I assign and think my conclusion is not supportable because you think what I’m calling “philosophical techniques” are unlikely or unnecessary. That’s a somewhat different sort of critique than saying the argument falls apart because it hinged, say, on a proposition that is false.
Sorry if that seems nitpicky, but I’m just trying to make sure I understand the objection and respond to it appropriately.
I’m pretty sure these two papers work (or don’t work) regardless of mindful/non-mindful AI. They aren’t examples of mental-phenomena-assuming alignment techniques—they just use “ontology” and “values” as suggestive words for math, like “learning” in reinforcement learning. So it seems like there’s no evidence that mental-phenomena-assuming alignment techniques are possible.
Ah, okay. I think there is such evidence and it doesn’t hinge on whether or not these two papers constitute evidence of it, but here I don’t consider such arguments. This suggests I should perhaps devote more time to establishing the feasibility of such an approach, although I think we have no strong evidence yet that mindless techniques will work either so I mostly focused on giving evidence that mindless techniques are unlikely to work bringing them below the prior probability of mindful techniques working (being set to the probability that any class of techniques will work), whereas mindful techniques only have “evidence” against them in the form of arguments speculating about the nature of mental phenomena and arguing against its existence, which I point out is something we’re practically suspending resolution on here to make an argument given sufficient uncertainty about it that we can’t use it to resolve the issue.
Of course if you look at the probability of the whole quest succeeding, it seems small either way, and distinguishing between different small probabilities is hard. But if you look at individual steps, we’ve made small but solid steps toward understanding “mindless” AI alignment (like the concept of adversarial examples), but no comparable advances in understanding “mindful” AI. So to me the weight of evidence is against your position.