If we suppose AGI do have minds, then alignment schemes can also use philosophical methods to address the values, goals, models, and behaviors of AGI. Such schemes would likely take the form of ensuring that updates to an AGI’s ontology and axiology converge on and maintain alignment with human interests (de Blanc, 2011), (Armstrong, 2015).
This point is the key to the whole post, and it seems wrong to me. De Blanc 2011 and Armstrong 2015 both allow “non-mindful” models. I don’t know any alignment ideas that would apply only to “mindful” AIs.
Returning from our deeper thread to your original comment, can you classify the nature of your objection to this point? For example, would you posit that there is nothing we would classify as mental phenomena that is not already addressed by other methods? If so, that seems fine to be because it reflects our uncertainty about the mental rather than a disagreement with this line of reasoning where we suppose there to be some things we might naively describe as mental, whatever the resolution of our uncertainty about the mental will later tell us, if anything.
Hmm, what do you think it means for an AI to have a mind? To me this simply means it engages in mental phenomena, which is to say it is the subject of intentional phenomena. Perhaps I’ve not been clear enough about that here?
Also, I think there’s no way to make sense of either of those papers without appeal to the mental, since ontology and axiology (values) exist only within the mental. You could try to play at imagining an AI as if it had ontology and axiology without a mind, but then this would be anthropomorphization of the AI rather than treating the AI as it is.
FWIW I also plan to have more to say in future papers about alignment mechanisms that more explicitly depend on the mental aspect of mindful AGI.
Also, I think there’s no way to make sense of either of those papers without appeal to the mental
I think I can help.
Imagine a cellular automaton like Game of Life with an infinite grid of cells. Most cells are initially empty, except for a big block of cells which works as a computer. The computer is running the following program: enumerate many starting configurations of 10x10 Game of Life cells, trying to find a configuration that unfolds into exactly 100 gliders. After the computer finds such a configuration, it uses its “actuator” (another bunch of cells) to self-destruct and replace itself with that configuration.
On one hand, we’ve described something “non-mindful” that can be implemented right now without principle difficulty. On the other hand, it can be said to have an “ontology” (it thinks the world is an infinite grid of cells) and “values” (it wants 100 gliders). If you reread de Blanc’s paper, imagining that it’s talking about these toy “ontologies” and “values” instead of the squishy human kind, it will make perfect sense.
This sounds dangerously like the same kind of failure-by-equivocation that plagued GOFAI. Just because we write a program that contains something we interpret as a representation of the world or acts in a way we interpret as goal directed does not mean the program actually has a representation with intention or action with telos. It also doesn’t mean that it doesn’t have those things (in fact I think it does since my position on the metaphysics of phenomena is one of panpsychism, though that’s a outside the scope of this paper), but what it does have is not necessary what we often think of it having in that way based on our understanding of its internal workings.
To make this concrete, let’s consider an even simpler case, a loop that adds up the number of ‘a’ characters it sees in a file:
acount := 0;
while nc := next(fd) {
acount++ if nc == 'a';
}
When acount is incremented because nc contains an ‘a’ it is being causally linked to the state of file. This doesn’t mean the program understands that fd contains acount ‘a’s, though, or even that acount is casually linked to the contents of fd; it only means that acount counts the number of ’a’s in fd, an interpretation we can make but the program itself cannot. So this thing properly has ontology in some very weak sense because it contains a thing that represents the world, but it’s the most minimal sort of such of a thing and one that is so simple as to be difficult to describe in words without accidentally ascribing it additional features it does not have.
Similarly it has a purpose (which, I would argue, is the source of value) of “count ’a’s until you reach the end of the file” but this is the purpose it has as we would describe it. To itself this program has no purpose on its own, but under execution is given purpose by the execution of individual instructions in a particular order that affect the state of a system, yet still this is a sort of purpose that the program cannot express to itself, because ultimately the program has no disposition to understand its own telos. So, yes, it has a purpose, but not of the sort we would ascribe to a thing we could think of as having a mind, and thus we cannot see it as valuing anything; it just does stuff because that’s what it is without regard to its own function.
So maybe we can make sense of those papers by applying our own interpretations on mindless systems to treat them as if they had ontologies and axiologies, but I view this as a mistake because it separates us from the systems’ own capacities and works on how we believe the systems to work, which may be correlated but are importantly different.
All I’m saying is, these papers weren’t intended to be only about “mindful” AI. (You could ask Peter or Stuart, but I’m pretty sure.) And the rest of your post relies on there being techniques that only work on “mindful” AI, so it kinda falls apart.
Hmm, I’m having a hard time figuring out what to do with this feedback. Yes, I suppose such mental-phenomena-assuming alignment techniques are possible and point to two examples of things that look a bit like research in this direction even if you disagree that there is a way those things could work, but this seems not to suggest the rest “falls apart” in that I am reasoning about likelihoods; instead it suggests you disagree with the order of magnitude of likelihoods I assign and think my conclusion is not supportable because you think what I’m calling “philosophical techniques” are unlikely or unnecessary. That’s a somewhat different sort of critique than saying the argument falls apart because it hinged, say, on a proposition that is false.
Sorry if that seems nitpicky, but I’m just trying to make sure I understand the objection and respond to it appropriately.
I’m pretty sure these two papers work (or don’t work) regardless of mindful/non-mindful AI. They aren’t examples of mental-phenomena-assuming alignment techniques—they just use “ontology” and “values” as suggestive words for math, like “learning” in reinforcement learning. So it seems like there’s no evidence that mental-phenomena-assuming alignment techniques are possible.
Ah, okay. I think there is such evidence and it doesn’t hinge on whether or not these two papers constitute evidence of it, but here I don’t consider such arguments. This suggests I should perhaps devote more time to establishing the feasibility of such an approach, although I think we have no strong evidence yet that mindless techniques will work either so I mostly focused on giving evidence that mindless techniques are unlikely to work bringing them below the prior probability of mindful techniques working (being set to the probability that any class of techniques will work), whereas mindful techniques only have “evidence” against them in the form of arguments speculating about the nature of mental phenomena and arguing against its existence, which I point out is something we’re practically suspending resolution on here to make an argument given sufficient uncertainty about it that we can’t use it to resolve the issue.
Of course if you look at the probability of the whole quest succeeding, it seems small either way, and distinguishing between different small probabilities is hard. But if you look at individual steps, we’ve made small but solid steps toward understanding “mindless” AI alignment (like the concept of adversarial examples), but no comparable advances in understanding “mindful” AI. So to me the weight of evidence is against your position.
This point is the key to the whole post, and it seems wrong to me. De Blanc 2011 and Armstrong 2015 both allow “non-mindful” models. I don’t know any alignment ideas that would apply only to “mindful” AIs.
Returning from our deeper thread to your original comment, can you classify the nature of your objection to this point? For example, would you posit that there is nothing we would classify as mental phenomena that is not already addressed by other methods? If so, that seems fine to be because it reflects our uncertainty about the mental rather than a disagreement with this line of reasoning where we suppose there to be some things we might naively describe as mental, whatever the resolution of our uncertainty about the mental will later tell us, if anything.
Hmm, what do you think it means for an AI to have a mind? To me this simply means it engages in mental phenomena, which is to say it is the subject of intentional phenomena. Perhaps I’ve not been clear enough about that here?
Also, I think there’s no way to make sense of either of those papers without appeal to the mental, since ontology and axiology (values) exist only within the mental. You could try to play at imagining an AI as if it had ontology and axiology without a mind, but then this would be anthropomorphization of the AI rather than treating the AI as it is.
FWIW I also plan to have more to say in future papers about alignment mechanisms that more explicitly depend on the mental aspect of mindful AGI.
I think I can help.
Imagine a cellular automaton like Game of Life with an infinite grid of cells. Most cells are initially empty, except for a big block of cells which works as a computer. The computer is running the following program: enumerate many starting configurations of 10x10 Game of Life cells, trying to find a configuration that unfolds into exactly 100 gliders. After the computer finds such a configuration, it uses its “actuator” (another bunch of cells) to self-destruct and replace itself with that configuration.
On one hand, we’ve described something “non-mindful” that can be implemented right now without principle difficulty. On the other hand, it can be said to have an “ontology” (it thinks the world is an infinite grid of cells) and “values” (it wants 100 gliders). If you reread de Blanc’s paper, imagining that it’s talking about these toy “ontologies” and “values” instead of the squishy human kind, it will make perfect sense.
This sounds dangerously like the same kind of failure-by-equivocation that plagued GOFAI. Just because we write a program that contains something we interpret as a representation of the world or acts in a way we interpret as goal directed does not mean the program actually has a representation with intention or action with telos. It also doesn’t mean that it doesn’t have those things (in fact I think it does since my position on the metaphysics of phenomena is one of panpsychism, though that’s a outside the scope of this paper), but what it does have is not necessary what we often think of it having in that way based on our understanding of its internal workings.
To make this concrete, let’s consider an even simpler case, a loop that adds up the number of ‘a’ characters it sees in a file:
When
acount
is incremented becausenc
contains an ‘a’ it is being causally linked to the state of file. This doesn’t mean the program understands thatfd
containsacount
‘a’s, though, or even thatacount
is casually linked to the contents offd
; it only means thatacount
counts the number of ’a’s infd
, an interpretation we can make but the program itself cannot. So this thing properly has ontology in some very weak sense because it contains a thing that represents the world, but it’s the most minimal sort of such of a thing and one that is so simple as to be difficult to describe in words without accidentally ascribing it additional features it does not have.Similarly it has a purpose (which, I would argue, is the source of value) of “count ’a’s until you reach the end of the file” but this is the purpose it has as we would describe it. To itself this program has no purpose on its own, but under execution is given purpose by the execution of individual instructions in a particular order that affect the state of a system, yet still this is a sort of purpose that the program cannot express to itself, because ultimately the program has no disposition to understand its own telos. So, yes, it has a purpose, but not of the sort we would ascribe to a thing we could think of as having a mind, and thus we cannot see it as valuing anything; it just does stuff because that’s what it is without regard to its own function.
So maybe we can make sense of those papers by applying our own interpretations on mindless systems to treat them as if they had ontologies and axiologies, but I view this as a mistake because it separates us from the systems’ own capacities and works on how we believe the systems to work, which may be correlated but are importantly different.
All I’m saying is, these papers weren’t intended to be only about “mindful” AI. (You could ask Peter or Stuart, but I’m pretty sure.) And the rest of your post relies on there being techniques that only work on “mindful” AI, so it kinda falls apart.
Hmm, I’m having a hard time figuring out what to do with this feedback. Yes, I suppose such mental-phenomena-assuming alignment techniques are possible and point to two examples of things that look a bit like research in this direction even if you disagree that there is a way those things could work, but this seems not to suggest the rest “falls apart” in that I am reasoning about likelihoods; instead it suggests you disagree with the order of magnitude of likelihoods I assign and think my conclusion is not supportable because you think what I’m calling “philosophical techniques” are unlikely or unnecessary. That’s a somewhat different sort of critique than saying the argument falls apart because it hinged, say, on a proposition that is false.
Sorry if that seems nitpicky, but I’m just trying to make sure I understand the objection and respond to it appropriately.
I’m pretty sure these two papers work (or don’t work) regardless of mindful/non-mindful AI. They aren’t examples of mental-phenomena-assuming alignment techniques—they just use “ontology” and “values” as suggestive words for math, like “learning” in reinforcement learning. So it seems like there’s no evidence that mental-phenomena-assuming alignment techniques are possible.
Ah, okay. I think there is such evidence and it doesn’t hinge on whether or not these two papers constitute evidence of it, but here I don’t consider such arguments. This suggests I should perhaps devote more time to establishing the feasibility of such an approach, although I think we have no strong evidence yet that mindless techniques will work either so I mostly focused on giving evidence that mindless techniques are unlikely to work bringing them below the prior probability of mindful techniques working (being set to the probability that any class of techniques will work), whereas mindful techniques only have “evidence” against them in the form of arguments speculating about the nature of mental phenomena and arguing against its existence, which I point out is something we’re practically suspending resolution on here to make an argument given sufficient uncertainty about it that we can’t use it to resolve the issue.
Of course if you look at the probability of the whole quest succeeding, it seems small either way, and distinguishing between different small probabilities is hard. But if you look at individual steps, we’ve made small but solid steps toward understanding “mindless” AI alignment (like the concept of adversarial examples), but no comparable advances in understanding “mindful” AI. So to me the weight of evidence is against your position.