“Fundamentally incapable” is perhaps putting things too strongly, when you can see from the Reflexion paper and other recent work in the past 2 weeks that humans are figuring out how to work around this issue via things like reflection/iterative prompting:
https://nanothoughts.substack.com/p/reflecting-on-reflexion
https://arxiv.org/abs/2303.11366
Using this simple approach lets GPT-4 jump from 67% to 88% correct on the HumanEval benchmark.
So I believe the lesson is: “limitations” in LLMs may turn out to be fairly easily enhanced away by clever human helpers. Therefore IMO, whether or not a particular LLM should be considered dangerous must also take into account the likely ways humans will build additional tech onto/around it to enhance it.
Yes, there appears to already be work in this area. Here is a recent example I ran across on twitter showing videos of 2 relatively low cost robot arms learning various very fine manipulation tasks apparently after just 15 minutes or so of demonstrations:
Introducing ACT: Action Chunking with Transformers
https://twitter.com/tonyzzhao/status/1640395685597159425
related website:
Learning Fine-Grained Bimanual Manipulation
with Low-Cost Hardware
https://tonyzhaozh.github.io/aloha/