I don’t understand why this crux needs to be dichotomous. Setting aside the opacity question for the moment, why can’t services in a CAIS be differentiable w.r.t. each other?
Example Consider a language modeling service (L) that is consumed by several downstream tasks, including various text classifiers, an auto-correction service for keyboards, and a machine translation service. In the end-to-end view, it would be wise for these downstream services to use a language representation from L and to propagate their own error information back to L so that it can improve its shared representation. Since the downstream services ultimately make up L’s raison d’etre, it will be obliged to do so.
For situations that are not so neatly differentiable, we can describe the services network as a stochastic computation graph if there is a benefit for end-to-end learning the entire system. This should lead to a slightly more precise conjecture about the relationship between the CAIS agent and utility-maximizing agent: A CAIS agent that can be described as a stochastic computation graph is equivalent to some utility-maximizing agent when trained end-to-end via approximate backpropagation.
It’s likely that CAIS agents aren’t usefully described as stochastic computation graphs, or that we may need to extend the usage of “stochastic computation graph” here to deal with services that create other services as offspring and attach them to the graph. But the possibility itself suggests a spectrum between the archetypal modular CAIS and an end-to-end CAIS, in which subgraphs of the services network are trained end-to-end. It’s not obvious to me that the CAIS as defined in the text discounts this scenario, despite Eric’s comments here.
I don’t understand why this crux needs to be dichotomous. Setting aside the opacity question for the moment, why can’t services in a CAIS be differentiable w.r.t. each other?
Example Consider a language modeling service (L) that is consumed by several downstream tasks, including various text classifiers, an auto-correction service for keyboards, and a machine translation service. In the end-to-end view, it would be wise for these downstream services to use a language representation from L and to propagate their own error information back to L so that it can improve its shared representation. Since the downstream services ultimately make up L’s raison d’etre, it will be obliged to do so.
For situations that are not so neatly differentiable, we can describe the services network as a stochastic computation graph if there is a benefit for end-to-end learning the entire system. This should lead to a slightly more precise conjecture about the relationship between the CAIS agent and utility-maximizing agent: A CAIS agent that can be described as a stochastic computation graph is equivalent to some utility-maximizing agent when trained end-to-end via approximate backpropagation.
It’s likely that CAIS agents aren’t usefully described as stochastic computation graphs, or that we may need to extend the usage of “stochastic computation graph” here to deal with services that create other services as offspring and attach them to the graph. But the possibility itself suggests a spectrum between the archetypal modular CAIS and an end-to-end CAIS, in which subgraphs of the services network are trained end-to-end. It’s not obvious to me that the CAIS as defined in the text discounts this scenario, despite Eric’s comments here.