Jordan Reynolds comments on Dangers of Closed-Loop AI

Jordan Reynolds 12 Nov 2024 13:59 UTC
3 points
0
I will propose a slight modification to the definition of closed-loop offered, not to be pedantic but to help align the definition with the risks proposed.
A closed-loop system generally incorporates inputs, an arbitrary function that translates inputs to outputs (like a model or agent), the outputs themselves, and some evaluation of the output’s efficacy against some defined objectives—this might be referred to as a loss function, cost function, utility function, reward function or objective function—let’s just call this the evaluation.
The defining characteristic of a closed loop system is that this evaluation is fed back into the input channel, not just the output of the function.
An LLM that produces outputs that are ultimately fed back into the context window as input is merely an autoregressive system, not necessarily a closed-loop system. In the case of chatbots LLMs and similar systems, there isn’t necessarily an evaluation of the outputs efficacy that is fed back into the context window in order to control the behavior of the system against a defined objective—these systems are autoregressive.
For a closed-loop AI to modify it’s behavior without a human-in-the-loop training process, it’s model/function will need to operate directly on the evaluation of it’s prior performance, and will require an inference-time objective function of some sort to guide this evaluation.
A classic example of closed-loop AI is the ‘system 2’ functionality that LeCun describes in his Autonomous Machine Intelligence paper (effectively Model Predictive Control)
https://openreview.net/pdf?id=BZ5a1r-kVsf