Interesting, and I agree, this sounds like it deserves a post, and I look forward to reading it..
Briefly for now, I agree, but I have mostly been avoiding thinking a lot about the scaffolding that we will put around the LLM that is generating the agent, mostly because I’m not certain how much of it we’re going to need, long-term, or what it will do (other than allowing continual learning or long-term memory past the context length). Obviously, assuming the thoughts/memories the scaffolding is handling are stored in natural language/symbolic form, or as embeddings in a space we understand well, this gives us translucent thoughts and allows us to do what people are calling “chain-of-thought alignment” (I’m still not sure that’s the best term for this, I think I’d prfere somneting with the words ‘scafolding’ or ‘trnslucent’ in it, but that seems to be the one the community has settled on). That seems potentially very important, but without a clear idea of how the scaffolding will be being used I don’t feel like we can do a lot of work on it yet, past maybe some proof-of-concept.
Clearly the mammalian brain contains at least separate short-term and long-term episodic memory, plus the learning of skills, as three different systems. Whether that sort of split of functionality is going to be useful in AIs, I don’t know. But then the mammalian brain also has a separate cortex and cerebellum, and I’m not clear what the purpose of that separation is either. So far the internal architectures we’ve implemented in AIs haven’t looked much like human brain structure, I wouldn’t be astonished if they started to converge a bit, but I suspect some of them may be rather specific to biological constraints that our artificial neural nets don’t have.
I’m also expecting our AIs to be tool users, and perhaps ones that integrate their tool use and LLM-based thinking quite tightly. And I’m definitely expecting for those tools to include computer systems, including things like writing and debugging software and then running it, and where appropriate also ones using symbolic AI along more GOFAI lines — things like symbolic theorem provers and so forth. Some of these may be alignment-relevant: just as there are times when the best way for a rational human to make an ethical decision (especially one involving things like large numbers and small risks that our wetware doesn’t handle very well) is to just shut up and multiply, I think there are going to be times when the right thing for an LLM-based AI to do is to consult something that looks like an algorithmic/symbolic weighing and comparison of the estimated pros and cons of specific plans. I don’t think we can build any such system that’s a single universally applicable utility function containing our current understanding of the entire of human values in a single vast equation (as much beloved by the more theoretical thinkers on LW), and if we can it’s presumably going to have a complexity in the petabytes/exabytes, so approximating not-relevant parts of it is going to be common, so what I’m talking about is something more comparable to some model in Economics or Data Science. I think much like any other models in a STEM field, individual models are going to have limited areas of applicability, and a making a specific complex decision may involved finding the applicable ones and patching them together, to make a utility projection with error bars for each alternative plan. If so, this sounds like the sort of activity where things like human oversight, debate, and so forth would be sensible, much like humans currently do when an organization is making a similarly complex decision.
Interesting, and I agree, this sounds like it deserves a post, and I look forward to reading it..
Briefly for now, I agree, but I have mostly been avoiding thinking a lot about the scaffolding that we will put around the LLM that is generating the agent, mostly because I’m not certain how much of it we’re going to need, long-term, or what it will do (other than allowing continual learning or long-term memory past the context length). Obviously, assuming the thoughts/memories the scaffolding is handling are stored in natural language/symbolic form, or as embeddings in a space we understand well, this gives us translucent thoughts and allows us to do what people are calling “chain-of-thought alignment” (I’m still not sure that’s the best term for this, I think I’d prfere somneting with the words ‘scafolding’ or ‘trnslucent’ in it, but that seems to be the one the community has settled on). That seems potentially very important, but without a clear idea of how the scaffolding will be being used I don’t feel like we can do a lot of work on it yet, past maybe some proof-of-concept.
Clearly the mammalian brain contains at least separate short-term and long-term episodic memory, plus the learning of skills, as three different systems. Whether that sort of split of functionality is going to be useful in AIs, I don’t know. But then the mammalian brain also has a separate cortex and cerebellum, and I’m not clear what the purpose of that separation is either. So far the internal architectures we’ve implemented in AIs haven’t looked much like human brain structure, I wouldn’t be astonished if they started to converge a bit, but I suspect some of them may be rather specific to biological constraints that our artificial neural nets don’t have.
I’m also expecting our AIs to be tool users, and perhaps ones that integrate their tool use and LLM-based thinking quite tightly. And I’m definitely expecting for those tools to include computer systems, including things like writing and debugging software and then running it, and where appropriate also ones using symbolic AI along more GOFAI lines — things like symbolic theorem provers and so forth. Some of these may be alignment-relevant: just as there are times when the best way for a rational human to make an ethical decision (especially one involving things like large numbers and small risks that our wetware doesn’t handle very well) is to just shut up and multiply, I think there are going to be times when the right thing for an LLM-based AI to do is to consult something that looks like an algorithmic/symbolic weighing and comparison of the estimated pros and cons of specific plans. I don’t think we can build any such system that’s a single universally applicable utility function containing our current understanding of the entire of human values in a single vast equation (as much beloved by the more theoretical thinkers on LW), and if we can it’s presumably going to have a complexity in the petabytes/exabytes, so approximating not-relevant parts of it is going to be common, so what I’m talking about is something more comparable to some model in Economics or Data Science. I think much like any other models in a STEM field, individual models are going to have limited areas of applicability, and a making a specific complex decision may involved finding the applicable ones and patching them together, to make a utility projection with error bars for each alternative plan. If so, this sounds like the sort of activity where things like human oversight, debate, and so forth would be sensible, much like humans currently do when an organization is making a similarly complex decision.