>The central challenge is to get enough bits-of-information about human values to narrow down a search-space to solutions compatible with human values.
This sounds like a fundamental disagreement with Yudkowsky’s view. (I think) Yudkowsky thinks the hardest part about alignment is getting an AGI to do any particular specified thing (that requires superhuman general intelligence) at all, whatever it may be, whereas by default AGI will optimize hard for something that no programmer had in mind; rather than the problem being about pointing at particular values. Do you recognize this as a disagreement, and what do you think of it? Do you think aiming-at-all is not that hard, or isn’t usefully separated from pointing at human values?
I think these are both pointing to basically-the-same problem. Under Yudkowsky’s view, it’s presumably not hard to get AI to do X for all values of X, but it’s hard for most of the X which humans care about, and it’s hard for most of the things which seem like human-intuitive “natural things to do”.
Huh. I thought Yudkowsky’s view was that it’s hard to get an AGI to do X for all values of X, where X is the final effect of the AGI on the world (like, what the universe looks like when the AI is done doing its thing). If X is instead an instrumental sort of thing, like getting a lot of energy and matter, then it’s not hard to get an AGI to do that.
So “get enough bits-of-information about human values” makes sense if you have something you can do with the bits, i.e. narrow down something. If we don’t know how to specify any final effect of an AGI at all, then we have an additional problem, which is that we don’t know how to do anything with the bits of information about which final effects we want.
What’s the search space? Policies, or algorithms, or behaviors, or something. What’s the information? Well, basically pointing a camera at anything in the world today gives you information about human values, or reading anything off the internet. What do we do with this information to get policies we like? The bits of information isn’t the problem, the problem is that we don’t know how to narrow down policy space or algorithm space or behavior space so that it has some particular final results. Getting bits of information about human values, and being able to aim an AGI at anything, are different problems.
Getting bits of information about human values, and being able to aim an AGI at anything, are different problems.
I think these are the same problem? Like, ability-to-narrow-down-a-search-space-or-behavior-space-by-a-factor-of-two is what a bit of information is. If we can’t use the information to narrow down a search space closer to the thing-the-information-is-supposedly-about, then we don’t actually have any information about that thing.
>Like, ability-to-narrow-down-a-search-space-or-behavior-space-by-a-factor-of-two is what a bit of information is.
Information is an upper bound, not a lower bound. The capacity of a channel gives you an upper bound on how many distinct messages you can send, not a lower bound on your performance on some task using messages sent over the channel. If you have a very high info-capacity channel with someone who speaks a different language from you, you don’t have an informational problem, you have some other problem (a translation problem).
>If we can’t use the information to narrow down a search space closer to the thing-the-information-is-supposedly-about, then we don’t actually have any information about that thing.
This seems to render the word “information” equivalent to “what we know how to do”, which is not the technical meaning of information. Do you mean to do that? If so, why? It seems like a misframing of the problem, because what’s hard about the problem is that you don’t know how to do something, and don’t know how to gather data about how to do that thing, because you don’t have a clear space of possibilities with a shattering set of clear observable implications of those possibilities. When you don’t know how to do something and don’t have a clear space of possibilities, the sort of pieces of progress you want to make aren’t fungible with each other the way information is fungible with other information.
[ETA: Like, if the space in question is the space of which “human values” is a member, then I’m saying, our problem isn’t locating human values in that space, our problem is that none of the points in the space are things we can actually implement, because we don’t know how to give any particular values to an AGI.]
The Shannon formula doesn’t define what information is, it it quantifies amount of information. People occasionally point this out as being kind of philosophically funny—we know how to measure amount of information, but we don’t really have a good definition of what information is. Talking about what information is immediately runs into the question of what the information is about, how the information relates to the thing(s) it’s about, etc.
Those are basically similar to the problems one runs into when talking about e.g. an AI’s objective and whether it’s “aligned with” something in the physical world. Like, this mathematical function (the objective) is supposed to talk about something out in the world, presumably it should relate to those things in the world somehow, etc. I claim it’s basically the same problem: how do we get symbolic information/functions/math-things to reliably “point to” particular things in the world?
(This is what Yudkowsky, IIUC, would call the “pointer problem”.)
Framed as a bits-of-information problem, the difficulty is not so much getting enough bits as getting bits which are actually “about” “human values”. (Presumably that’s why my explanations seem so confusing.)
If natural abstractions are a thing, in what sense is “make this AGI have particular effect X” trying to be about human values, if X is expressed using natural abstractions?
>The central challenge is to get enough bits-of-information about human values to narrow down a search-space to solutions compatible with human values.
This sounds like a fundamental disagreement with Yudkowsky’s view. (I think) Yudkowsky thinks the hardest part about alignment is getting an AGI to do any particular specified thing (that requires superhuman general intelligence) at all, whatever it may be, whereas by default AGI will optimize hard for something that no programmer had in mind; rather than the problem being about pointing at particular values. Do you recognize this as a disagreement, and what do you think of it? Do you think aiming-at-all is not that hard, or isn’t usefully separated from pointing at human values?
I think these are both pointing to basically-the-same problem. Under Yudkowsky’s view, it’s presumably not hard to get AI to do X for all values of X, but it’s hard for most of the X which humans care about, and it’s hard for most of the things which seem like human-intuitive “natural things to do”.
Huh. I thought Yudkowsky’s view was that it’s hard to get an AGI to do X for all values of X, where X is the final effect of the AGI on the world (like, what the universe looks like when the AI is done doing its thing). If X is instead an instrumental sort of thing, like getting a lot of energy and matter, then it’s not hard to get an AGI to do that.
That’s right.
So “get enough bits-of-information about human values” makes sense if you have something you can do with the bits, i.e. narrow down something. If we don’t know how to specify any final effect of an AGI at all, then we have an additional problem, which is that we don’t know how to do anything with the bits of information about which final effects we want.
I mean, yeah, we do need to be able to use the bits to narrow down a search space.
What’s the search space? Policies, or algorithms, or behaviors, or something. What’s the information? Well, basically pointing a camera at anything in the world today gives you information about human values, or reading anything off the internet. What do we do with this information to get policies we like? The bits of information isn’t the problem, the problem is that we don’t know how to narrow down policy space or algorithm space or behavior space so that it has some particular final results. Getting bits of information about human values, and being able to aim an AGI at anything, are different problems.
I think these are the same problem? Like, ability-to-narrow-down-a-search-space-or-behavior-space-by-a-factor-of-two is what a bit of information is. If we can’t use the information to narrow down a search space closer to the thing-the-information-is-supposedly-about, then we don’t actually have any information about that thing.
>Like, ability-to-narrow-down-a-search-space-or-behavior-space-by-a-factor-of-two is what a bit of information is.
Information is an upper bound, not a lower bound. The capacity of a channel gives you an upper bound on how many distinct messages you can send, not a lower bound on your performance on some task using messages sent over the channel. If you have a very high info-capacity channel with someone who speaks a different language from you, you don’t have an informational problem, you have some other problem (a translation problem).
>If we can’t use the information to narrow down a search space closer to the thing-the-information-is-supposedly-about, then we don’t actually have any information about that thing.
This seems to render the word “information” equivalent to “what we know how to do”, which is not the technical meaning of information. Do you mean to do that? If so, why? It seems like a misframing of the problem, because what’s hard about the problem is that you don’t know how to do something, and don’t know how to gather data about how to do that thing, because you don’t have a clear space of possibilities with a shattering set of clear observable implications of those possibilities. When you don’t know how to do something and don’t have a clear space of possibilities, the sort of pieces of progress you want to make aren’t fungible with each other the way information is fungible with other information.
[ETA: Like, if the space in question is the space of which “human values” is a member, then I’m saying, our problem isn’t locating human values in that space, our problem is that none of the points in the space are things we can actually implement, because we don’t know how to give any particular values to an AGI.]
The Shannon formula doesn’t define what information is, it it quantifies amount of information. People occasionally point this out as being kind of philosophically funny—we know how to measure amount of information, but we don’t really have a good definition of what information is. Talking about what information is immediately runs into the question of what the information is about, how the information relates to the thing(s) it’s about, etc.
Those are basically similar to the problems one runs into when talking about e.g. an AI’s objective and whether it’s “aligned with” something in the physical world. Like, this mathematical function (the objective) is supposed to talk about something out in the world, presumably it should relate to those things in the world somehow, etc. I claim it’s basically the same problem: how do we get symbolic information/functions/math-things to reliably “point to” particular things in the world?
(This is what Yudkowsky, IIUC, would call the “pointer problem”.)
Framed as a bits-of-information problem, the difficulty is not so much getting enough bits as getting bits which are actually “about” “human values”. (Presumably that’s why my explanations seem so confusing.)
If natural abstractions are a thing, in what sense is “make this AGI have particular effect X” trying to be about human values, if X is expressed using natural abstractions?
In that case, it’s not about human values, which is one of the very nice things the natural abstraction hypothesis buys us.