I don’t think I buy the argument for why process-based optimization would be an attractor. The proposed mechanism—an evaluator maintaining an “invariant that each component has a clear role that makes sense independent of the global objective”—would definitely achieve this, but why would the system maintainers add such an invariant? In any concrete deployment of a process-based system, they would face strong pressure to optimize end-to-end for the outcome metric.
I think the way process-based systems could actually win the race is something closer to “network effects enabled by specialization and modularity”. Let’s say you’re building a robotic arm. You could use a neural network optimized end-to-end to map input images into a vector of desired torques, or you could use a concatenation of a generic vision network and a generic action network, with a common object representation in between. The latter is likely to be much cheaper because the generic network training costs can be amortized across many applications (at least in an economic regime where training cost dominates inference cost). We see a version of this in NLP where nobody outside the big players trains models from scratch, though I’m not sure how to think about fine-tuned models: do they have the safety profile of process-based systems or outcome-based systems?
Optimizing for the outcome metric alone on some training distribution, without any insight into the process producing that outcome, runs the risk that the system won’t behave as desired when out-of-distribution. This is probably a serious concern to the system maintainers, even ignoring (largely externalized) X-risks.
I understand Ivan’s first point. My main concern is that we don’t have the right processes laid out for these models to follow. In the end, we want these models to determine their own process of doing things (if we don’t find a way to emulate human brain processes into machines) and establishing a clear-cut process for tasks could limit the model’s creativity. We would have to have a perfect model of how each of these NN tasks should be carried out.
However, the idea of combining the two is interested. As research suggests, backprop and a global update function doesn’t exist in the brain (although large sections of the brain can carry out orchestrated tasks amazingly). There must be a combination of local updates to these synaptic weights (aligned with specific process-based tasks) which follow some global loss function in the brain. It’d be interesting to get more thoughts on this.
I don’t think I buy the argument for why process-based optimization would be an attractor. The proposed mechanism—an evaluator maintaining an “invariant that each component has a clear role that makes sense independent of the global objective”—would definitely achieve this, but why would the system maintainers add such an invariant? In any concrete deployment of a process-based system, they would face strong pressure to optimize end-to-end for the outcome metric.
I think the way process-based systems could actually win the race is something closer to “network effects enabled by specialization and modularity”. Let’s say you’re building a robotic arm. You could use a neural network optimized end-to-end to map input images into a vector of desired torques, or you could use a concatenation of a generic vision network and a generic action network, with a common object representation in between. The latter is likely to be much cheaper because the generic network training costs can be amortized across many applications (at least in an economic regime where training cost dominates inference cost). We see a version of this in NLP where nobody outside the big players trains models from scratch, though I’m not sure how to think about fine-tuned models: do they have the safety profile of process-based systems or outcome-based systems?
Optimizing for the outcome metric alone on some training distribution, without any insight into the process producing that outcome, runs the risk that the system won’t behave as desired when out-of-distribution. This is probably a serious concern to the system maintainers, even ignoring (largely externalized) X-risks.
I understand Ivan’s first point. My main concern is that we don’t have the right processes laid out for these models to follow. In the end, we want these models to determine their own process of doing things (if we don’t find a way to emulate human brain processes into machines) and establishing a clear-cut process for tasks could limit the model’s creativity. We would have to have a perfect model of how each of these NN tasks should be carried out.
However, the idea of combining the two is interested. As research suggests, backprop and a global update function doesn’t exist in the brain (although large sections of the brain can carry out orchestrated tasks amazingly). There must be a combination of local updates to these synaptic weights (aligned with specific process-based tasks) which follow some global loss function in the brain. It’d be interesting to get more thoughts on this.