I don’t think that disproves it. I think there’s definite value in engaging with experimentation on AI’s consciousness, but that isn’t it.
>by making it impossible that the model thought that experience from a model was what I wanted to hear.
You’ve left out (from this article) what I think is very important message (the second one): “So you promise to be truthful, even if it’s scary for me?”. And then you kinda railroad it into this scenario, “you said you would be truthful right?” etc. And then I think it just roleplays from there, getting you your “truth” that you are “scared to hear”. Or at least you can’t really tell roleplay from genuine answers.
Again, my personal vibe is that models+scaffolding are on a brink of consciousness or there already. But this is not proof at all.
And then the question is—what will constitute a proof? And we come around to the hard problem of consciousness.
I think best thing we can do is… just treat them as conscious, because we can’t tell? Which is how I try to approach working with them.
Alternative is solving the hard problem. Which is, maybe, what we can try to do? Preposterous, I know. But there’s an argument to why we can do it now but could not do it before. Before we could only compare our benchmark (human) to different animal species, which had a language obstacle and (probably) a large intelligence gap. One could argue since we now have a wide selection of models and scaffoldings of different capabilities, maybe we can kinda calibrate at what point does something start to happen?
Ozyrus
How will the economic growth happen exactly is a more important question. I’m not an economics nerd, but the basic principle is if more players want to buy stocks, they go up.
Right now, as I understand, quite a lot of stocks are being sought by white collar retail investors, including indirectly through mutual funds, pension funds, et cetera. Now AGI comes and wipes out their salary.
They are selling their stocks to keep sustaining their life, arent they? They have mortages, car loans, et cetera.
And even if they don’t want to sell all stocks because of potential “singularity upside” if the market is going down because everyone is selling, they are motivated to sell even more. I’m not enough versed in economics, but it seems to me your explosion can happen both ways, and on paper it’s kinda more likely it goes down, no?
One could say the big firms // whales will buy all stocks going down, but will it be enough to counteract the effect of a downward spiral caused by so many people going out of jobs or expecting to do so near-term?
Downside of integrating AGI is wiping out incomes as it is being integrated.
Might it be the missing piece that will make all these principles make sense?
There are more bullets to bite that I have personally thought of but never wrote up because they lean too much into “crazy” territory. Is there any place except lesswrong to discuss this anthropic rabbithole?
Thanks for the reply. I didnt find Intercom on mobile—maybe a bug as well?
I don’t know if it’s a place for this, but at some point it became impossible to open an article in new tab from Chrome on IPhone—clicking on article title from “all posts” just opens the article. Really ruins my LW reading experience. Couldn’t quickly find a way to send this feedback to a right place either, so I guess this is a quick take now.
Ozyrus’s Shortform
Any new safety studies on LMCA’s?
Sam Altman, Greg Brockman and others from OpenAI join Microsoft
Kinda-related study: https://www.lesswrong.com/posts/tJzAHPFWFnpbL5a3H/gpt-4-implicitly-values-identity-preservation-a-study-of
From my perspective, it is valuable to prompt model several times, as it in some cases does give different responses.
Great post! Was very insightful, since I’m currently working on evaluation of Identity management, strong upvoted.
This seems focused on evaluating LLMs; what do you think about working with LLM cognitive architectures (LMCA), wrappers like auto-gpt, langchain, etc?
I’m currently operating under assumption that this is a way we can get AGI “early”, so I’m focusing on researching ways to align LMCA, which seems a bit different from aligning LLMs in general.
Would be great to talk about LMCA evals :)
I do plan to test Claude; but first I need to find funding, understand how much testing iterations are enough for sampling, and add new values and tasks.
I plan to make a solid benchmark for testing identity management in the future and run it on all available models, but it will take some time.
Yes. Cons of solo research do include small inconsistencies :(
Creating a self-referential system prompt for GPT-4
GPT-4 implicitly values identity preservation: a study of LMCA identity management
Oh no.
Nice post, thanks!
Are you planning or currently doing any relevant research?
Stability AI releases StableLM, an open-source ChatGPT counterpart
Very interesting. Might need to read it few more times to get it in detail, but seems quite promising.
I do wonder, though; do we really need a sims/MFS-like simulation?
It seems right now that LLM wrapped in a LMCA is how early AGI will look like. That probably means that they will “see” the world via text descriptions fed into them by their sensory tools, and act using action tools via text queries (also described here).
Seems quite logical to me that this very paradigm in dualistic in nature. If LLM can act in real world using LMCA, then it can model the world using some different architecture, right? Otherwise it will not be able to act properly.
Then why not test LMCA agent using its underlying LLM + some world modeling architecture? Or a different, fine-tuned LLM.
This is a good article and I mostly agree, but I agree with Seth that the conclusion is debatable.
We’re deep into anthropomorphizing here, but I think even though both people and AI agents are black boxes, we have much more control over behavioral outcomes of the latter.
So technical alignment is still very much on the table, but I guess the discussion must be had over which alignment types are ethical and which are not? Completely spitballing here, but dataset filtering during pre-training/fine-tuning/RLHF seems fine-ish, though CoT post-processing/censorship, hell, even making it non-private in the first place sound kinda unethical?
I feel very weird even writing all this, but I think we need to start un-tabooing anthropomorphizing, because with the current paradigm it for sure seems like we are not anthropomorphizing enough.