Ozyrus

Karma: 391

Ozyrus May 18, 2023, 5:12 AM
1 point
0
in reply to: awg’s comment on: GPT-4 implicitly values identity preservation: a study of LMCA identity management
I do plan to test Claude; but first I need to find funding, understand how much testing iterations are enough for sampling, and add new values and tasks.
I plan to make a solid benchmark for testing identity management in the future and run it on all available models, but it will take some time.

Ozyrus May 18, 2023, 5:08 AM
1 point
0
in reply to: awg’s comment on: GPT-4 implicitly values identity preservation: a study of LMCA identity management
Yes. Cons of solo research do include small inconsistencies :(

Creating a self-referential system prompt for GPT-4

OzyrusMay 17, 2023, 2:13 PM

3 points

1 comment3 min readLW link

GPT-4 implicitly values identity preservation: a study of LMCA identity management

OzyrusMay 17, 2023, 2:13 PM

21 points

4 comments13 min readLW link

Ozyrus Apr 22, 2023, 8:26 AM
3 points
0
on: The Agency Overhang
Thanks, nice post!
You’re not alone in this concern, see posts (1,2) by me and this post by Seth Herd.
I will be publishing my research agenda and first results next week.

Ozyrus Apr 21, 2023, 12:23 PM
1 point
0
in reply to: M. Y. Zuo’s comment on: DeepMind and Google Brain are merging [Linkpost]
Oh no.

Ozyrus Apr 20, 2023, 8:37 AM
2 points
0
on: Language Models are a Potentially Safe Path to Human-Level AGI
Nice post, thanks!
Are you planning or currently doing any relevant research?

Stability AI releases StableLM, an open-source ChatGPT counterpart

OzyrusApr 20, 2023, 6:04 AM

11 points

3 comments1 min readLW link

(github.com)

Ozyrus Apr 20, 2023, 5:22 AM
2 points
0
on: Davidad’s Bold Plan for Alignment: An In-Depth Explanation
Very interesting. Might need to read it few more times to get it in detail, but seems quite promising.

I do wonder, though; do we really need a sims/MFS-like simulation?

It seems right now that LLM wrapped in a LMCA is how early AGI will look like. That probably means that they will “see” the world via text descriptions fed into them by their sensory tools, and act using action tools via text queries (also described here).

Seems quite logical to me that this very paradigm in dualistic in nature. If LLM can act in real world using LMCA, then it can model the world using some different architecture, right? Otherwise it will not be able to act properly.

Then why not test LMCA agent using its underlying LLM + some world modeling architecture? Or a different, fine-tuned LLM.

Ozyrus Apr 19, 2023, 6:50 PM
6 points
1
on: How could you possibly choose what an AI wants?
Very nice post, thank you!
I think that it’s possible to achieve with the current LLM paradigm, although it does require more (probably much more) effort on aligning the thing that will possibly get to being superhuman first, which is an LLM wrapped in in some cognitive architecture (also see this post).
That means that LLM must be implicitly trained in an aligned way, and the LMCA must be explicitly designed in such a way as to allow for reflection and robust value preservation, even if LMCA is able to edit explicitly stated goals (I described it in a bit more detail in this post).

Ozyrus Apr 19, 2023, 5:18 PM
3 points
0
in reply to: Seth Herd’s comment on: Capabilities and alignment of LLM cognitive architectures
Thanks.
My concern is that I don’t see much effort in alignment community to work on this thing, unless I’m missing something. Maybe you know of such efforts? Or was that perceived lack of effort the reason for this article?
I don’t know how much I can keep up this independent work, and I would love if there was some joint effort to tackle this. Maybe an existing lab, or an open-source project?

Ozyrus Apr 19, 2023, 6:02 AM
3 points
0
on: Capabilities and alignment of LLM cognitive architectures
We need a consensus on how to call these architectures. LMCA sounds fine to me.
All in all, a very nice writeup. I did my own brief overview of alignment problems of such agents here.
I would love to collaborate and do some discussion/research together.
What’s your take on how these LCMAs may self-improve and how to possibly control it?

Alignment of AutoGPT agents

OzyrusApr 12, 2023, 12:54 PM

14 points

1 comment4 min readLW link

Welcome to the decade of Em

OzyrusApr 10, 2023, 7:45 AM

4 points

1 comment1 min readLW link

Ozyrus Apr 6, 2023, 6:31 AM
3 points
0
on: Auto-GPT: Open-sourced disaster?
I don’t think this paradigm is necessary bad, given enough alignment research. See my post: https://www.lesswrong.com/posts/cLKR7utoKxSJns6T8/ica-simulacra I am finishing a post about alignment of such systems. Please do comment if you know of any existing research concerning it.

Ozyrus Apr 5, 2023, 5:39 PM
1 point
0
in reply to: Max H’s comment on: ICA Simulacra
I agree. Do you know of any existing safety research of such architectures? It seems that aligning these types of systems can pose completely different challenges than aligning LLMs in general.

ICA Simulacra

OzyrusApr 5, 2023, 6:41 AM

26 points

2 comments7 min readLW link

Ozyrus Jan 22, 2023, 7:55 AM
4 points
0
on: Just don’t make a utility maximizer?
I feel like yes, you are. See https://www.lesswrong.com/tag/instrumental-convergence and related posts. As far as I understand it, sufficiently advanced oracular AI will seek to “agentify” itself in one way or the other (unbox itself, so to say) and then converge on power-seeking behaviour that puts humanity at risk.

Ozyrus Nov 2, 2022, 11:07 AM
6 points
1
on: All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Is there a comprehensive list of AI Safety orgs/personas and what exactly they do? Is there one for capabilities orgs with their stance on safety?
I think I saw something like that, but can’t find it.

Ozyrus Jun 24, 2022, 6:26 PM
5 points
on: Do alignment concerns extend to powerful non-AI agents?
My thoughts here is that we should look into the value of identity. I feel like even with godlike capabilities I will still thread very carefully around self-modification to preserve what I consider “myself” (that includes valuing humanity).
I even have some ideas on safety experiments on transformer-based agents to look into if and how they value their identity.

Ozyrus

Creat­ing a self-refer­en­tial sys­tem prompt for GPT-4

GPT-4 im­plic­itly val­ues iden­tity preser­va­tion: a study of LMCA iden­tity management

Sta­bil­ity AI re­leases StableLM, an open-source ChatGPT counterpart

Align­ment of Au­toGPT agents

Wel­come to the decade of Em