and a lot of the problems stem from decision theories that are too smart.
Complex hostile subsystems won’t be developed by AI models without an optimization pressure that gives them a reward for doing so. This is I think a big chunk of current schisms. We can’t know if a black box model isn’t deceiving in the same way we don’t know the government isn’t hiding secret alien technology, but both can be extremely unlikely. In a way what I am hearing is essentially an AGI “conspiracy theory”, that above a certain level of intelligence an AI model would be invisibly conspiring against us with no measurable sign. It is impossible to disprove, same you cannot actually disprove that the government isn’t secretly doing $conspiracy. (The unlikelihood scales with the number of people who would have to be involved, the cost, the benefit to the government, and the amount of obvious crimes the government is committing depending on the conspiracy that the conspirators remain silent on)
My claim is somewhat different than the give you an example. I’m not concerned whether there exist useful tasks that allow factorization and myopia, assembly lines exist as a proof of existence. I’m concerned about whether the majority of tasks/jobs or the majority of economic value that we want AI/AGI to be in are factorizable this way, and whether they are compatible with a myopic setup.
Care to try to even think through the list from a high level? When I do this exercise I see nothing but factorable tasks everywhere, but part of the bias is that humans have to factor tasks. We are measurably more efficient as singletons. Such as “all manufacturing”, “all resource gathering”, “all construction”, “megascale biotech research”—all very separable tasks.
Per episode myopia relies on you being able to detect how much optimization beyond the episode is occuring, which is harder than detecting the existence of non-myopia that per step myopia offers.
Are you assuming online training? I was assuming offline training, and auto populating simulations from online data that you offline train on.
I note that the fact that non-myopia was a strategy that Microsoft and other companies used successfully is very concerning to me, as the fact that such companies are now worth billions of dollars and have thousands to tens of thousands of jobs suggests something concerning:
Microsoft products are rarely used in high reliability systems anywhere for this reason. Not because humans organizations are perfect but because it’s evolutionary—use Windows in a product that fails, and you lose money.
Care to try to even think through the list from a high level? When I do this exercise I see nothing but factorable tasks everywhere, but part of the bias is that humans have to factor tasks. We are measurably more efficient as singletons. Such as “all manufacturing”, “all resource gathering”, “all construction”, “megascale biotech research”—all very separable tasks.
A counterexample to the factoring of tasks is given by Steven Byrnes:
For benefits of generality (4.3.2.1), an argument I find compelling is that if you’re trying to invent a new invention or design a new system, you need a cross-domain system-level understanding of what you’re trying to do and how. Like at my last job, it was not at all unusual for me to find myself sketching out the algorithms on a project and sketching out the link budget and scrutinizing laser spec sheets and scrutinizing FPGA spec sheets and nailing down end-user requirements, etc. etc. Not because I’m individually the best person at each of those tasks—or even very good!—but because sometimes a laser-related problem is best solved by switching to a different algorithm, or an FPGA-related problem is best solved by recognizing that the real end-user requirements are not quite what we thought, etc. etc. And that kind of design work is awfully hard unless a giant heap of relevant information and knowledge is all together in a single brain / world-model.
Take the task of designing something like a car, internals.
You might start with a rough idea of the specs, and a precise equation for the value of each feature. You have a scaled model for how it needs to look.
You start a search process where you consider many possible ways to arrange the components within the body shell. Say none of the configurations will fit and meet specs.
You send a request up the stack for a scaled up version of the shell. You get it. You arrange the components into possible designs that fit, and then send the candidate design for simulated testing.
The simulated testing reveals a common failure in one of the parts, and all of the available alternatives for that part have a flaw. So you send a request to the “part designer” to give you a part that satisfies these new tightened specs that will not allow the flaw, and ask for a range of alternate packages.
The resulting redesigned part is now too big to fit, so you rearrange the parts again/send a request to the body shell designer for even more space, and so on.
It is many, may iterative interactions where the flow of the process has to go up and down the stack many times. In addition I am describing the flow for one design candidate. It’s actually a large tree of other candidates you should be checking, where each time there was a choice you queue up a message to the next stage for each possible choice you could have made. (and prune, from all the packages in flight in the system, the worst ones)
If you think about how to implement this, one way is data driven. All the “roles” in this network sit there quiescent waiting for an initial data package. All the context of the process is in the message itself, there is no agent “responsible” for the car design getting finished, but a message flow pipeline where after some time you will get valid car design alternatives in the ‘in box’ of the system that sent the request, or a message stating that the process failed from an intractable problem. (there were constraints that could not be satisfied after exhausting every design permutation)
There is no reason these roles cannot be superintelligences, but they get no context. They don’t think or have an internal narrative, they wait forever for a message, but apply superhuman and general skill when given the task. They are stateless microservices, though as they do have superintelligence level neural architectures, they are too fat to be called ‘micro’.
and a lot of the problems stem from decision theories that are too smart.
Complex hostile subsystems won’t be developed by AI models without an optimization pressure that gives them a reward for doing so. This is I think a big chunk of current schisms. We can’t know if a black box model isn’t deceiving in the same way we don’t know the government isn’t hiding secret alien technology, but both can be extremely unlikely. In a way what I am hearing is essentially an AGI “conspiracy theory”, that above a certain level of intelligence an AI model would be invisibly conspiring against us with no measurable sign. It is impossible to disprove, same you cannot actually disprove that the government isn’t secretly doing $conspiracy. (The unlikelihood scales with the number of people who would have to be involved, the cost, the benefit to the government, and the amount of obvious crimes the government is committing depending on the conspiracy that the conspirators remain silent on)
My claim is somewhat different than the give you an example. I’m not concerned whether there exist useful tasks that allow factorization and myopia, assembly lines exist as a proof of existence. I’m concerned about whether the majority of tasks/jobs or the majority of economic value that we want AI/AGI to be in are factorizable this way, and whether they are compatible with a myopic setup.
Care to try to even think through the list from a high level? When I do this exercise I see nothing but factorable tasks everywhere, but part of the bias is that humans have to factor tasks. We are measurably more efficient as singletons. Such as “all manufacturing”, “all resource gathering”, “all construction”, “megascale biotech research”—all very separable tasks.
Per episode myopia relies on you being able to detect how much optimization beyond the episode is occuring, which is harder than detecting the existence of non-myopia that per step myopia offers.
Are you assuming online training? I was assuming offline training, and auto populating simulations from online data that you offline train on.
I note that the fact that non-myopia was a strategy that Microsoft and other companies used successfully is very concerning to me, as the fact that such companies are now worth billions of dollars and have thousands to tens of thousands of jobs suggests something concerning:
Microsoft products are rarely used in high reliability systems anywhere for this reason. Not because humans organizations are perfect but because it’s evolutionary—use Windows in a product that fails, and you lose money.
A counterexample to the factoring of tasks is given by Steven Byrnes:
Fair, though it is separable.
Take the task of designing something like a car, internals.
You might start with a rough idea of the specs, and a precise equation for the value of each feature. You have a scaled model for how it needs to look.
You start a search process where you consider many possible ways to arrange the components within the body shell. Say none of the configurations will fit and meet specs.
You send a request up the stack for a scaled up version of the shell. You get it. You arrange the components into possible designs that fit, and then send the candidate design for simulated testing.
The simulated testing reveals a common failure in one of the parts, and all of the available alternatives for that part have a flaw. So you send a request to the “part designer” to give you a part that satisfies these new tightened specs that will not allow the flaw, and ask for a range of alternate packages.
The resulting redesigned part is now too big to fit, so you rearrange the parts again/send a request to the body shell designer for even more space, and so on.
It is many, may iterative interactions where the flow of the process has to go up and down the stack many times. In addition I am describing the flow for one design candidate. It’s actually a large tree of other candidates you should be checking, where each time there was a choice you queue up a message to the next stage for each possible choice you could have made. (and prune, from all the packages in flight in the system, the worst ones)
If you think about how to implement this, one way is data driven. All the “roles” in this network sit there quiescent waiting for an initial data package. All the context of the process is in the message itself, there is no agent “responsible” for the car design getting finished, but a message flow pipeline where after some time you will get valid car design alternatives in the ‘in box’ of the system that sent the request, or a message stating that the process failed from an intractable problem. (there were constraints that could not be satisfied after exhausting every design permutation)
There is no reason these roles cannot be superintelligences, but they get no context. They don’t think or have an internal narrative, they wait forever for a message, but apply superhuman and general skill when given the task. They are stateless microservices, though as they do have superintelligence level neural architectures, they are too fat to be called ‘micro’.