For robo-taxis it is more a society-based problem than a technical one.
Robo-taxis have problems with edge cases (like some situations in some places with some bad circumstances). Usually in those where human drivers also have even worse problems (like pedestrians wearing black on the road at night with rainy weather—robo-taxi at least have LIDAR to detect objects in bad visibility). Sometimes they are also prone to object detection hacking (by stickers put on signs, paintings on the road, etc.). In general, they have fewer problems than human drivers.
Robo-taxis have a public trust problem. Any more serious accident hits the news and propagates distrust, even if they are already safer than human drivers in general.
Robo-taxis and self-driving cars in general move responsibility from the driver to the producer. The responsibility that the producer does not want to have and needs to count toward costs. It makes investors cautious.
What is missing so we would have robo-taxis is mostly public trust and more investments.
For AGI we already have basic blocks. Just need to scale them up and connect them into a proper system. What building blocks? These:
Memory, duh. It is there for a long time, with many solutions with indexing and high performance.
Thoughts generating. Now we have LLMs that can generate thoughts based on instructions and context. It can easily be made to interact with other models and memory. A more complex system can be built from several LLMs with different instructions interacting with each other.
Structuring the system and communication within it. It can be done with normal code.
Loop of thoughts (stream of thoughts). It can be easily achieved by looping LLM(s).
Vision. Image and video processing. We have a lot of transformer models and image-processing techniques. There are already sensible image-to-text models, even LLM-based ones (so can answer questions about images).
Actuators and movement. We have models built for movements on different machines. Including much of humanoid movements. We currently even have models that are able to one-shot or few-shot learning of movements for attached machines.
Learning of new abilities. LLMs are able to write code. It can write code for itself to make more complex procedures based on more basic commands. There was a work where LLM explored and learned Minecraft having only very basic procedures. It wrote code for more complex operations and used what it wrote to move around and do things, and build stuff.
Connection to external interfaces (even GUI). It can be translated into basic API that can be explored, memorized, and called by the system that can build more complex operations for itself.
What is missing for AGI:
Performance. LLMs have high performance in reading input data, but not very much inferring the result. It also does not scale very well (but better than humans). Multi-model complex systems on current LLMs would be either slow and somewhat dumb and make a lot of mistakes (open-source fast models, even GPT 3.5) or be very slow but better.
Cost-effectiveness. For sporadic use like “write me this or that” it is cost-effective, but for a continuous stream of thoughts, especially with several models, it does not compare well to a remote human worker. It needs some further advancements, maybe dedicated hardware,
Learning, refining, and testing is very slow and costly with those models. This makes a cap for anyone wanting to build something sensible. Rather slow steps are done towards AGI by the main players.
The scope for the model is rather short currently. Best of powerful models have a scope of about 32 thousand tokens. There are some models that trade quality for being able to operate on more tokens, but those are not the best ones. 32k seems a lot, but when you need a lot of context and information to process to have coherent thoughts on non-trivial topics not rooted in model learning data… then it is a problem. This is the case with streams of thoughts if you need it to analyze instructions, analyze context and inputs, propose strategy, refine it, propose current tactic, refine it, propose next moves and decisions, refine it, generate instructions for the current task at hand, and also process learning to add new procedures, code, memories, etc. to reuse later. Some modern LLMs are technically capable of all that, but the scope is a road blocker for any non-trivial thing here.
If I would be to guess I would say that AGI will be sooner in scale—just because there is hype, there are big investments and the main problems are currently less like “we need a breakthrough” and more like “we need refinements”. For robo-taxis we still need a lot more investments and some breakthroughs in areas of public trust or law.
Both seem around the corner for me.
For robo-taxis it is more a society-based problem than a technical one.
Robo-taxis have problems with edge cases (like some situations in some places with some bad circumstances). Usually in those where human drivers also have even worse problems (like pedestrians wearing black on the road at night with rainy weather—robo-taxi at least have LIDAR to detect objects in bad visibility). Sometimes they are also prone to object detection hacking (by stickers put on signs, paintings on the road, etc.). In general, they have fewer problems than human drivers.
Robo-taxis have a public trust problem. Any more serious accident hits the news and propagates distrust, even if they are already safer than human drivers in general.
Robo-taxis and self-driving cars in general move responsibility from the driver to the producer. The responsibility that the producer does not want to have and needs to count toward costs. It makes investors cautious.
What is missing so we would have robo-taxis is mostly public trust and more investments.
For AGI we already have basic blocks. Just need to scale them up and connect them into a proper system. What building blocks? These:
Memory, duh. It is there for a long time, with many solutions with indexing and high performance.
Thoughts generating. Now we have LLMs that can generate thoughts based on instructions and context. It can easily be made to interact with other models and memory. A more complex system can be built from several LLMs with different instructions interacting with each other.
Structuring the system and communication within it. It can be done with normal code.
Loop of thoughts (stream of thoughts). It can be easily achieved by looping LLM(s).
Vision. Image and video processing. We have a lot of transformer models and image-processing techniques. There are already sensible image-to-text models, even LLM-based ones (so can answer questions about images).
Actuators and movement. We have models built for movements on different machines. Including much of humanoid movements. We currently even have models that are able to one-shot or few-shot learning of movements for attached machines.
Learning of new abilities. LLMs are able to write code. It can write code for itself to make more complex procedures based on more basic commands. There was a work where LLM explored and learned Minecraft having only very basic procedures. It wrote code for more complex operations and used what it wrote to move around and do things, and build stuff.
Connection to external interfaces (even GUI). It can be translated into basic API that can be explored, memorized, and called by the system that can build more complex operations for itself.
What is missing for AGI:
Performance. LLMs have high performance in reading input data, but not very much inferring the result. It also does not scale very well (but better than humans). Multi-model complex systems on current LLMs would be either slow and somewhat dumb and make a lot of mistakes (open-source fast models, even GPT 3.5) or be very slow but better.
Cost-effectiveness. For sporadic use like “write me this or that” it is cost-effective, but for a continuous stream of thoughts, especially with several models, it does not compare well to a remote human worker. It needs some further advancements, maybe dedicated hardware,
Learning, refining, and testing is very slow and costly with those models. This makes a cap for anyone wanting to build something sensible. Rather slow steps are done towards AGI by the main players.
The scope for the model is rather short currently. Best of powerful models have a scope of about 32 thousand tokens. There are some models that trade quality for being able to operate on more tokens, but those are not the best ones. 32k seems a lot, but when you need a lot of context and information to process to have coherent thoughts on non-trivial topics not rooted in model learning data… then it is a problem. This is the case with streams of thoughts if you need it to analyze instructions, analyze context and inputs, propose strategy, refine it, propose current tactic, refine it, propose next moves and decisions, refine it, generate instructions for the current task at hand, and also process learning to add new procedures, code, memories, etc. to reuse later. Some modern LLMs are technically capable of all that, but the scope is a road blocker for any non-trivial thing here.
If I would be to guess I would say that AGI will be sooner in scale—just because there is hype, there are big investments and the main problems are currently less like “we need a breakthrough” and more like “we need refinements”. For robo-taxis we still need a lot more investments and some breakthroughs in areas of public trust or law.