To reduce potential duplication of effort, we highlight several existing research fields that can be utilized to address these problems (Section 6). We conclude with suggestions for future research (Section 7).
Sections 5 and 6, not 6 and 7
Informally, we can assume that some description of the world is given by context and view a task as something specified by an initial state and an end state (or states) - accomplishing the task amounts to causing a transformation from the starting state to one of the desired end states.
I feel like this definition is not capturing what I mean by a “task”. Many “agent-like” things, such as “become supreme ruler of the world”, seem like tasks according to this definition; many useless things like “twitching randomly” can be thought of as completing a “task” as defined here and so would be counted as “services”.
(I don’t have a much better definition, but I prefer to just use the intuitive notion of “task” rather than the definition you have.)
A related challenge is the frame problem—identifying which consequences of accomplishing a task are important, and which can be ignored.
I’m pretty sure that’s not what the frame problem is. The frame problem is that by default you ignore consequences of your actions, and so you have to arduously specify all of the things that shouldn’t change.
Finally, this generates some utilities for the humans.
I’m surprised this matters in this model, continuing to read… ah okay, this model isn’t being used anywhere.
List of Research Questions
Looking at these, I feel like they are subquestions of “how do you design a good society that can handle technological development”—most of it is not AI-specific or CAIS-specific.
Looking at these, I feel like they are subquestions of “how do you design a good society that can handle technological development”—most of it is not AI-specific or CAIS-specific.
For me this is the main point of CAIS. It reframes many AI Safety problems in terms of “make a good society” problems, but now you can consider scenarios involving only AI. We can start to answer the question of “how do we make a good society of AIs?” with the question “How did we do it with humans?”. It seems like human society did not have great outcomes for everyone by default. Making human society function took a lot of work, and failed a lot of times. Can we learn from that and make AI Society fail less often or less catastrophically?
Yeah to be clear I do think it is worth it for people to be thinking about these problems from this perspective; I just don’t think they need to be AI researchers.
Yeah, I understand that. My point is that the same way society didn’t work by default, systems of AI won’t work by default, and that the interventions that will be needed will require AI researchers. That is, it’s not just about setting up laws, norms, contracts, and standards for managing these systems. It is about figuring out how to make AI systems which interact with each other in the way that humans do in the presence of laws, norms, standards and contracts. Someone who is not an AI research would have no hope in solving this, since they cannot understand how AI systems will interact, and cannot offer appropriate interventions.
It seems to me like you might each be imagining a slightly different situation.
Not quite certain what the difference is. But it seems like Michael is talking about setting up well the parts of the system that are mostly/only AI. In my opinion, this requires AI researchers, in collaboration with experts from whatever-area-is-getting-automated. (So while it might not fall only under the umbrella of AI research, it critically requires it.) Whereas—it seems to me that - Rohin is talking more about ensuring that the (mostly) human parts of society do their job in the presence of automatization. For example, how to deal with unemployment when parts of the industry get automated. (And I agree that I wouldn’t go looking for AI researches when tackling this.)
Fixed the wrong section numbers and frame problem description.
Informally, we can assume that some description of the world is given by context and view a task as something specified by an initial state and an end state (or states) - accomplishing the task amounts to causing a transformation from the starting state to one of the desired end states.
I feel like this definition is not capturing what I mean by a “task”. Many “agent-like” things, such as “become supreme ruler of the world”, seem like tasks according to this definition; many useless things like “twitching randomly” can be thought of as completing a “task” as defined here and so would be counted as “services”.
Could it be that the problem is not in the “task” part but in the definition service? If I consider the task of building me a house that I will like, I can envision a very service-like way of doing that (ask me a bunch of routine questions, select house-model correspondingly, then proceed to build it in a cook-book manner by calling on other services). But I can also imagine going about this in a very agent-like manner.
(Also, “twitching randomly” seems like a perfectly valid task, and a twitch-bot as a perfectly valid service. Just a very stupid one that nobody would want to build or pay for. Uhm, probably. Hopefully.)
It seems like what you’re trying to get at is some notion of a difference between a service and an agent. My objection is primarily that the specific definitions you chose don’t seem to point at the essential differences between a service and an agent. I don’t have a strong opinion as to whether the problem is with the definition of “task” or of “service”; just that together they don’t seem to point at the right thing.
Looking at these, I feel like they are subquestions of “how do you design a good society that can handle technological development”—most of it is not AI-specific or CAIS-specific.
It is intentional that not all the problems are technical problems—for example, I expect that not tackling unemployment due to AI might indirectly make you a lot less safe (it seems prudent to not be in a civial war or war when you are attempting to finish building AGI). However, you are right that the list might nevertheless be too broad (and too loosely tied to AI).
Anyway: As a smaller point, I feel that most of the listed problems will get magnified as you introduce more AI services, or they might gain important twists. As a larger point: Am I correct to understand you as implying that “technical AI alignment researchers should primarily focus on other problems” (modulo qualifications)? My intuition is that this doesn’t follow, or at least that we might disagree on the degree to which this needs to be qualified to be true. However, I have not yet thought about this enough to be able to elaborate more right now :(.
A bookmark that seems relevant is the following prompt:
Conditional on your AI system never turning into an agent-like AGI, how is “not dying and not losing some % of your potential utility because of AI” different from “how do you design a good society that can handle the process of more and more things getting automated”?
(This should go with many disclaimers, first among those the fact that this is a prompt, not an implicit statement that I fully endorse.)
Am I correct to understand you as implying that “technical AI alignment researchers should primarily focus on other problems” (modulo qualifications)?
Kind of? I think it’s more like “these are indeed problems, and someone should focus on them, but I wouldn’t call it technical AI alignment” (and as a result, I wouldn’t call people working on them “technical AI alignment researchers”). For many of these problems, if I wanted to find people to work on them, I would not look for AI researchers (and instead look for economists, political theorists, etc).
Like, I kind of wish this document had been written without AI / AI safety researchers in mind.
Thoughts while reading:
Sections 5 and 6, not 6 and 7
I feel like this definition is not capturing what I mean by a “task”. Many “agent-like” things, such as “become supreme ruler of the world”, seem like tasks according to this definition; many useless things like “twitching randomly” can be thought of as completing a “task” as defined here and so would be counted as “services”.
(I don’t have a much better definition, but I prefer to just use the intuitive notion of “task” rather than the definition you have.)
I’m pretty sure that’s not what the frame problem is. The frame problem is that by default you ignore consequences of your actions, and so you have to arduously specify all of the things that shouldn’t change.
I’m surprised this matters in this model, continuing to read… ah okay, this model isn’t being used anywhere.
Looking at these, I feel like they are subquestions of “how do you design a good society that can handle technological development”—most of it is not AI-specific or CAIS-specific.
For me this is the main point of CAIS. It reframes many AI Safety problems in terms of “make a good society” problems, but now you can consider scenarios involving only AI. We can start to answer the question of “how do we make a good society of AIs?” with the question “How did we do it with humans?”. It seems like human society did not have great outcomes for everyone by default. Making human society function took a lot of work, and failed a lot of times. Can we learn from that and make AI Society fail less often or less catastrophically?
Yeah to be clear I do think it is worth it for people to be thinking about these problems from this perspective; I just don’t think they need to be AI researchers.
Yeah, I understand that. My point is that the same way society didn’t work by default, systems of AI won’t work by default, and that the interventions that will be needed will require AI researchers. That is, it’s not just about setting up laws, norms, contracts, and standards for managing these systems. It is about figuring out how to make AI systems which interact with each other in the way that humans do in the presence of laws, norms, standards and contracts. Someone who is not an AI research would have no hope in solving this, since they cannot understand how AI systems will interact, and cannot offer appropriate interventions.
It seems to me like you might each be imagining a slightly different situation.
Not quite certain what the difference is. But it seems like Michael is talking about setting up well the parts of the system that are mostly/only AI. In my opinion, this requires AI researchers, in collaboration with experts from whatever-area-is-getting-automated. (So while it might not fall only under the umbrella of AI research, it critically requires it.) Whereas—it seems to me that - Rohin is talking more about ensuring that the (mostly) human parts of society do their job in the presence of automatization. For example, how to deal with unemployment when parts of the industry get automated. (And I agree that I wouldn’t go looking for AI researches when tackling this.)
Fixed the wrong section numbers and frame problem description.
Could it be that the problem is not in the “task” part but in the definition service? If I consider the task of building me a house that I will like, I can envision a very service-like way of doing that (ask me a bunch of routine questions, select house-model correspondingly, then proceed to build it in a cook-book manner by calling on other services). But I can also imagine going about this in a very agent-like manner.
(Also, “twitching randomly” seems like a perfectly valid task, and a twitch-bot as a perfectly valid service. Just a very stupid one that nobody would want to build or pay for. Uhm, probably. Hopefully.)
It seems like what you’re trying to get at is some notion of a difference between a service and an agent. My objection is primarily that the specific definitions you chose don’t seem to point at the essential differences between a service and an agent. I don’t have a strong opinion as to whether the problem is with the definition of “task” or of “service”; just that together they don’t seem to point at the right thing.
It is intentional that not all the problems are technical problems—for example, I expect that not tackling unemployment due to AI might indirectly make you a lot less safe (it seems prudent to not be in a civial war or war when you are attempting to finish building AGI). However, you are right that the list might nevertheless be too broad (and too loosely tied to AI).
Anyway: As a smaller point, I feel that most of the listed problems will get magnified as you introduce more AI services, or they might gain important twists. As a larger point: Am I correct to understand you as implying that “technical AI alignment researchers should primarily focus on other problems” (modulo qualifications)? My intuition is that this doesn’t follow, or at least that we might disagree on the degree to which this needs to be qualified to be true. However, I have not yet thought about this enough to be able to elaborate more right now :(. A bookmark that seems relevant is the following prompt:
Conditional on your AI system never turning into an agent-like AGI, how is “not dying and not losing some % of your potential utility because of AI” different from “how do you design a good society that can handle the process of more and more things getting automated”?
(This should go with many disclaimers, first among those the fact that this is a prompt, not an implicit statement that I fully endorse.)
Kind of? I think it’s more like “these are indeed problems, and someone should focus on them, but I wouldn’t call it technical AI alignment” (and as a result, I wouldn’t call people working on them “technical AI alignment researchers”). For many of these problems, if I wanted to find people to work on them, I would not look for AI researchers (and instead look for economists, political theorists, etc).
Like, I kind of wish this document had been written without AI / AI safety researchers in mind.