Why the downvotes? Do people feel that “the FAI should at some point fold up and vanish out of existence” is so obvious that it’s not worth pointing out? Or disagree that the FAI should in fact do that? Or feel that it’s wrong to point this out in the context of Manfred’s comment? (I didn’t mean to suggest that Manfred disagrees with this, but felt that his comment was giving the wrong impression.)
Will sentient, self-interested agents ever be free from the existential risks of UFAI/intelligence amplification without some form of oversight? It’s nice to think that humanity will grow up and learn how to get along, but even if that’s true for 99.9999999% of humans that leaves 7 people from today’s population who would probably have the power to trigger their own UFAI hard takeoff after a FAI fixes the world and then disappears. Even if such a disaster could be stopped it is a risk probably worth the cost of keeping some form of FAI around indefinitely. What FAI becomes is anyone’s guess but the need for what FAI does will probably not go away. If we can’t trust humans to do FAI’s job now, I don’t think we can trust humanity’s descendents to do FAI’s job either, just from Loeb’s theorem. I think it is unlikely that humans will become enough like FAI to properly do FAI’s job. They would essentially give up their humanity in the process.
A secure operating system for governed matter doesn’t need to take the form of a powerful optimization process, nor does verification of transparent agents trusted to run at root level. Benja’s hope seems reasonable to me.
A secure operating system for governed matter doesn’t need to take the form of a powerful optimization process
This seems non-obvious. (So I’m surprised to see you state it as if it was obvious. Unless you already wrote about the idea somewhere else and are expecting people to pick up the reference?) If we want the “secure OS” to stop posthumans from running private hell simulations, it has to determine what constitutes a hell simulation and successfully detect all such attempts despite superintelligent efforts at obscuration. How does it do that without being superintelligent itself?
nor does verification of transparent agents trusted to run at root level
This sounds interesting but I’m not sure what it means. Can you elaborate?
Hm, that’s true. Okay, you do need enough intelligence in the OS to detect certain types of simulations / and/or the intention to build such simulations, however obscured.
If you can verify an agent’s goals (and competence at self-modification), you might be able to trust zillions of different such agents to all run at root level, depending on what the tiny failure probability worked out to quantitatively.
If you can verify an agent’s goals (and competence at self-modification), you might be able to trust zillions of different such agents to all run at root level, depending on what the tiny failure probability worked out to quantitatively.
That means each non-trivial agent would become the FAI for its own resources. To see the necessity of this imagine what initial verification would be required to allow an agent to simulate its own agents. Restricted agents may not need a full FAI if they are proven to avoid simulating non-restricted agents, but any agent approaching the complexity of humans would need the full FAI “conscience” running to evaluate its actions and interfere if necessary.
EDIT: “interfere” is probably the wrong word. From the inside the agent would want to satisfy the FAI goals in addition to its own. I’m confused about how to talk about the difference between what an agent would want and what an FAI would want for all agents, and how it would feel from the inside to have both sets of goals.
I’d hope so, since I think I got the idea from you :-)
This is tangential to what this thread is about, but I’d add that I think it’s reasonable to have hope that humanity will grow up enough that we can collectively make reasonable decisions about things affecting our then-still-far-distant future. To put it bluntly, if we had an FAI right now I don’t think it should be putting a question like “how high is the priority of sending out seed ships to other galaxies ASAP” to a popular vote, but I do think there’s reasonable hope that humanity will be able to make that sort of decision for itself eventually. I suppose this is down to definitions, but I tend to visualize FAI as something that is trying to steer the future of humanity; if humanity eventually takes on the responsibility for this itself, then even if for whatever reason it decides to use a powerful optimization process for the special purpose of preventing people from building uFAI, it seems unhelpful to me to gloss this without more qualification as “the friendly AI [… will always …] stop unsafe AIs from being a big risk”, because the latter just sounds to me like we’re keeping around the part where it steers the fate of humanity as well.
I do agree that it seems quite likely that even in the long run, we may not want to modify ourselves so that we are perfectly dependable, because it seems like that would mean getting rid of traits we want to keep around. That said, I agree with Eliezer’s reply about why this doesn’t mean we need to keep an FAI around forever; see also my comment here.
I don’t think Löb’s theorem enters into it. For example, though I agree that it’s unlikely that we’d want to do so, I don’t believe Löb’s theorem would be an obstacle to modifying humans in a way making them super-dependable.
Why the downvotes? Do people feel that “the FAI should at some point fold up and vanish out of existence” is so obvious that it’s not worth pointing out? Or disagree that the FAI should in fact do that? Or feel that it’s wrong to point this out in the context of Manfred’s comment? (I didn’t mean to suggest that Manfred disagrees with this, but felt that his comment was giving the wrong impression.)
Will sentient, self-interested agents ever be free from the existential risks of UFAI/intelligence amplification without some form of oversight? It’s nice to think that humanity will grow up and learn how to get along, but even if that’s true for 99.9999999% of humans that leaves 7 people from today’s population who would probably have the power to trigger their own UFAI hard takeoff after a FAI fixes the world and then disappears. Even if such a disaster could be stopped it is a risk probably worth the cost of keeping some form of FAI around indefinitely. What FAI becomes is anyone’s guess but the need for what FAI does will probably not go away. If we can’t trust humans to do FAI’s job now, I don’t think we can trust humanity’s descendents to do FAI’s job either, just from Loeb’s theorem. I think it is unlikely that humans will become enough like FAI to properly do FAI’s job. They would essentially give up their humanity in the process.
A secure operating system for governed matter doesn’t need to take the form of a powerful optimization process, nor does verification of transparent agents trusted to run at root level. Benja’s hope seems reasonable to me.
This seems non-obvious. (So I’m surprised to see you state it as if it was obvious. Unless you already wrote about the idea somewhere else and are expecting people to pick up the reference?) If we want the “secure OS” to stop posthumans from running private hell simulations, it has to determine what constitutes a hell simulation and successfully detect all such attempts despite superintelligent efforts at obscuration. How does it do that without being superintelligent itself?
This sounds interesting but I’m not sure what it means. Can you elaborate?
Hm, that’s true. Okay, you do need enough intelligence in the OS to detect certain types of simulations / and/or the intention to build such simulations, however obscured.
If you can verify an agent’s goals (and competence at self-modification), you might be able to trust zillions of different such agents to all run at root level, depending on what the tiny failure probability worked out to quantitatively.
That means each non-trivial agent would become the FAI for its own resources. To see the necessity of this imagine what initial verification would be required to allow an agent to simulate its own agents. Restricted agents may not need a full FAI if they are proven to avoid simulating non-restricted agents, but any agent approaching the complexity of humans would need the full FAI “conscience” running to evaluate its actions and interfere if necessary.
EDIT: “interfere” is probably the wrong word. From the inside the agent would want to satisfy the FAI goals in addition to its own. I’m confused about how to talk about the difference between what an agent would want and what an FAI would want for all agents, and how it would feel from the inside to have both sets of goals.
I’d hope so, since I think I got the idea from you :-)
This is tangential to what this thread is about, but I’d add that I think it’s reasonable to have hope that humanity will grow up enough that we can collectively make reasonable decisions about things affecting our then-still-far-distant future. To put it bluntly, if we had an FAI right now I don’t think it should be putting a question like “how high is the priority of sending out seed ships to other galaxies ASAP” to a popular vote, but I do think there’s reasonable hope that humanity will be able to make that sort of decision for itself eventually. I suppose this is down to definitions, but I tend to visualize FAI as something that is trying to steer the future of humanity; if humanity eventually takes on the responsibility for this itself, then even if for whatever reason it decides to use a powerful optimization process for the special purpose of preventing people from building uFAI, it seems unhelpful to me to gloss this without more qualification as “the friendly AI [… will always …] stop unsafe AIs from being a big risk”, because the latter just sounds to me like we’re keeping around the part where it steers the fate of humanity as well.
Thanks for explaning the reasoning!
I do agree that it seems quite likely that even in the long run, we may not want to modify ourselves so that we are perfectly dependable, because it seems like that would mean getting rid of traits we want to keep around. That said, I agree with Eliezer’s reply about why this doesn’t mean we need to keep an FAI around forever; see also my comment here.
I don’t think Löb’s theorem enters into it. For example, though I agree that it’s unlikely that we’d want to do so, I don’t believe Löb’s theorem would be an obstacle to modifying humans in a way making them super-dependable.