A secure operating system for governed matter doesn’t need to take the form of a powerful optimization process, nor does verification of transparent agents trusted to run at root level. Benja’s hope seems reasonable to me.
A secure operating system for governed matter doesn’t need to take the form of a powerful optimization process
This seems non-obvious. (So I’m surprised to see you state it as if it was obvious. Unless you already wrote about the idea somewhere else and are expecting people to pick up the reference?) If we want the “secure OS” to stop posthumans from running private hell simulations, it has to determine what constitutes a hell simulation and successfully detect all such attempts despite superintelligent efforts at obscuration. How does it do that without being superintelligent itself?
nor does verification of transparent agents trusted to run at root level
This sounds interesting but I’m not sure what it means. Can you elaborate?
Hm, that’s true. Okay, you do need enough intelligence in the OS to detect certain types of simulations / and/or the intention to build such simulations, however obscured.
If you can verify an agent’s goals (and competence at self-modification), you might be able to trust zillions of different such agents to all run at root level, depending on what the tiny failure probability worked out to quantitatively.
If you can verify an agent’s goals (and competence at self-modification), you might be able to trust zillions of different such agents to all run at root level, depending on what the tiny failure probability worked out to quantitatively.
That means each non-trivial agent would become the FAI for its own resources. To see the necessity of this imagine what initial verification would be required to allow an agent to simulate its own agents. Restricted agents may not need a full FAI if they are proven to avoid simulating non-restricted agents, but any agent approaching the complexity of humans would need the full FAI “conscience” running to evaluate its actions and interfere if necessary.
EDIT: “interfere” is probably the wrong word. From the inside the agent would want to satisfy the FAI goals in addition to its own. I’m confused about how to talk about the difference between what an agent would want and what an FAI would want for all agents, and how it would feel from the inside to have both sets of goals.
I’d hope so, since I think I got the idea from you :-)
This is tangential to what this thread is about, but I’d add that I think it’s reasonable to have hope that humanity will grow up enough that we can collectively make reasonable decisions about things affecting our then-still-far-distant future. To put it bluntly, if we had an FAI right now I don’t think it should be putting a question like “how high is the priority of sending out seed ships to other galaxies ASAP” to a popular vote, but I do think there’s reasonable hope that humanity will be able to make that sort of decision for itself eventually. I suppose this is down to definitions, but I tend to visualize FAI as something that is trying to steer the future of humanity; if humanity eventually takes on the responsibility for this itself, then even if for whatever reason it decides to use a powerful optimization process for the special purpose of preventing people from building uFAI, it seems unhelpful to me to gloss this without more qualification as “the friendly AI [… will always …] stop unsafe AIs from being a big risk”, because the latter just sounds to me like we’re keeping around the part where it steers the fate of humanity as well.
A secure operating system for governed matter doesn’t need to take the form of a powerful optimization process, nor does verification of transparent agents trusted to run at root level. Benja’s hope seems reasonable to me.
This seems non-obvious. (So I’m surprised to see you state it as if it was obvious. Unless you already wrote about the idea somewhere else and are expecting people to pick up the reference?) If we want the “secure OS” to stop posthumans from running private hell simulations, it has to determine what constitutes a hell simulation and successfully detect all such attempts despite superintelligent efforts at obscuration. How does it do that without being superintelligent itself?
This sounds interesting but I’m not sure what it means. Can you elaborate?
Hm, that’s true. Okay, you do need enough intelligence in the OS to detect certain types of simulations / and/or the intention to build such simulations, however obscured.
If you can verify an agent’s goals (and competence at self-modification), you might be able to trust zillions of different such agents to all run at root level, depending on what the tiny failure probability worked out to quantitatively.
That means each non-trivial agent would become the FAI for its own resources. To see the necessity of this imagine what initial verification would be required to allow an agent to simulate its own agents. Restricted agents may not need a full FAI if they are proven to avoid simulating non-restricted agents, but any agent approaching the complexity of humans would need the full FAI “conscience” running to evaluate its actions and interfere if necessary.
EDIT: “interfere” is probably the wrong word. From the inside the agent would want to satisfy the FAI goals in addition to its own. I’m confused about how to talk about the difference between what an agent would want and what an FAI would want for all agents, and how it would feel from the inside to have both sets of goals.
I’d hope so, since I think I got the idea from you :-)
This is tangential to what this thread is about, but I’d add that I think it’s reasonable to have hope that humanity will grow up enough that we can collectively make reasonable decisions about things affecting our then-still-far-distant future. To put it bluntly, if we had an FAI right now I don’t think it should be putting a question like “how high is the priority of sending out seed ships to other galaxies ASAP” to a popular vote, but I do think there’s reasonable hope that humanity will be able to make that sort of decision for itself eventually. I suppose this is down to definitions, but I tend to visualize FAI as something that is trying to steer the future of humanity; if humanity eventually takes on the responsibility for this itself, then even if for whatever reason it decides to use a powerful optimization process for the special purpose of preventing people from building uFAI, it seems unhelpful to me to gloss this without more qualification as “the friendly AI [… will always …] stop unsafe AIs from being a big risk”, because the latter just sounds to me like we’re keeping around the part where it steers the fate of humanity as well.