Very nice post, thank you!
I think that it’s possible to achieve with the current LLM paradigm, although it does require more (probably much more) effort on aligning the thing that will possibly get to being superhuman first, which is an LLM wrapped in in some cognitive architecture (also see this post).
That means that LLM must be implicitly trained in an aligned way, and the LMCA must be explicitly designed in such a way as to allow for reflection and robust value preservation, even if LMCA is able to edit explicitly stated goals (I described it in a bit more detail in this post).
Ozyrus
Thanks.
My concern is that I don’t see much effort in alignment community to work on this thing, unless I’m missing something. Maybe you know of such efforts? Or was that perceived lack of effort the reason for this article?
I don’t know how much I can keep up this independent work, and I would love if there was some joint effort to tackle this. Maybe an existing lab, or an open-source project?
We need a consensus on how to call these architectures. LMCA sounds fine to me.
All in all, a very nice writeup. I did my own brief overview of alignment problems of such agents here.
I would love to collaborate and do some discussion/research together.
What’s your take on how these LCMAs may self-improve and how to possibly control it?
Alignment of AutoGPT agents
Welcome to the decade of Em
I don’t think this paradigm is necessary bad, given enough alignment research. See my post: https://www.lesswrong.com/posts/cLKR7utoKxSJns6T8/ica-simulacra I am finishing a post about alignment of such systems. Please do comment if you know of any existing research concerning it.
I agree. Do you know of any existing safety research of such architectures? It seems that aligning these types of systems can pose completely different challenges than aligning LLMs in general.
ICA Simulacra
I feel like yes, you are. See https://www.lesswrong.com/tag/instrumental-convergence and related posts. As far as I understand it, sufficiently advanced oracular AI will seek to “agentify” itself in one way or the other (unbox itself, so to say) and then converge on power-seeking behaviour that puts humanity at risk.
Is there a comprehensive list of AI Safety orgs/personas and what exactly they do? Is there one for capabilities orgs with their stance on safety?
I think I saw something like that, but can’t find it.
My thoughts here is that we should look into the value of identity. I feel like even with godlike capabilities I will still thread very carefully around self-modification to preserve what I consider “myself” (that includes valuing humanity).
I even have some ideas on safety experiments on transformer-based agents to look into if and how they value their identity.
[Question] Do alignment concerns extend to powerful non-AI agents?
Thanks for the writeup. I feel like there’s been a lack of similar posts and we need to step it up.
Maybe the only way for AI Safety to work at all is only to analyze potential vectors of AGI attacks and try to counter them one way or the other. Seems like an alternative that doesn’t contradict other AI Safety research as it requires, I think, entirely different set of skills.
I would like to see a more detailed post by “doomers” on how they perceive these vectors of attack and some healthy discussion about them.
It seems to me that AGI is not born Godlike, but rather becomes Godlike (but still constrained by physical world) over some time, and this process is very much possible to detect.
P.S. I really don’t get how people who know (I hope) that map is not a territory can think that AI can just simulate everything and pick the best option. Maybe I’m the one missing something here?
Thanks,.That means a lot. Focusing on getting out right now.
Please check your DM’s; I’ve been translating as well. We can sync it up!
Google announces Pathways: new generation multitask AI Architecture
I can’t say I am one, but I am currently working on research and prototyping and will probably refrain to that until I can prove some of my hypotheses, since I do have access to the tools I need at the moment.
Still, I didn’t want this post to only have relevance to my case, as I stated I don’t think probability of successs is meaningful. But I am interested in the opinions of the community related to other similar cases.
edit: It’s kinda hard to answer your comment since it keeps changing every time I refresh. By “can’t say I am one” I mean a “world-class engineer” in the original comment. I do appreciate the change of tone in the final (?) version, though :)
[Question] Memetic hazards of AGI architecture posts
I could recommend Robert Miles channel. While not a course per se, it gives good info on a lot of AI safety aspects, as far as I can tell.
Very interesting. Might need to read it few more times to get it in detail, but seems quite promising.
I do wonder, though; do we really need a sims/MFS-like simulation?
It seems right now that LLM wrapped in a LMCA is how early AGI will look like. That probably means that they will “see” the world via text descriptions fed into them by their sensory tools, and act using action tools via text queries (also described here).
Seems quite logical to me that this very paradigm in dualistic in nature. If LLM can act in real world using LMCA, then it can model the world using some different architecture, right? Otherwise it will not be able to act properly.
Then why not test LMCA agent using its underlying LLM + some world modeling architecture? Or a different, fine-tuned LLM.