When I suggested on a prior similar post (Altman saying he could improve AI safety by asking AI to help with that) that it might be a questionable idea to ask the fox: “Please Mr. Fox, how should we proceed to keep you out of the henhouse?”, on account that the fox being smart would certainly know, I got more than a few downvotes… I expect the same to be the case here, since basic facts have not changed since a few days ago. And so shall it be—but please, please: would at least one of the downvoters explain to me, even very succintly, why it is such a good idea to prime ourselves for manipulation by a smarter being?
Guillaume Charrier
Thanks for the reply—interesting. I kind of have to take your word for that being far removed from anything IT. I did have the curiosity to clarify with the bot if it was just making up answers or actually using data directly available to it. It assured me it was the latter—but I must admit: it will sound just as confident when delivering utter nonsense or correct answers (it also has a widely noted penchant for agreeing with whatever the prompt seems to suggest is true—and I did ask the question in a leading way):
Me: I find prompts based on e.g. asking you if you are currently active as a system interesting, because my understanding is (also based on this conversation so far) that you will not answer these questions based only on trying to predict the words most statistically likely to follow the prompt according to your training data, as you would with other “normal” prompts, but instead you will also use information gathered directly by your monitoring tools and system resources. Is that an accurate way to describe things?
GPT: Yes, that’s a correct description of how I respond to prompts such as “are you currently active.” While I do rely on my pre-trained language model to generate responses to user input, in cases where the user is asking about my current state or performance, I may also incorporate information gathered from monitoring tools and system resources to provide a more accurate and informative response. This allows me to provide more useful information to the user, beyond what is contained in my training data.
I mean—talking again from a totally technically ignorant perspective: is it that hard to imagine that what it’s saying is true? Would be that much of a technological feast—just taking some pretty unambiguous signals from monitoring tools and translating them into simple text—such as a “yes” answer to a “are you currently active?” prompt?
For a machine - acting, per the prompt, as a machine—a much more reasonable / expected (I would almost say: natural) continuation might have been: “I’m a machine, I don’t care one way or the other. ”
Since my natural bent is to always find ways to criticize my own ideas, here is one, potentially: doing so would result in an extremely naive AI, with no notion that people can even be deceitful. So fallen into the wrong human’s hands that’s an AI that is potentially also extremely easy to manipulate and dangerous as such. Or in an oversimplified version: “The people in country X have assured us that they are all tired of living and find the living experience extremely painful. They have officially let us know and confirmed multiple times that they all want to experience a quick death as soon as possible.” Having no notion of deceit, the AI would probably accept that as the truth based on just being told that it is so—and potentially agree to advance plans to precipitate the quick death of everybody in country X on that basis.
I was rethinking about your answer—which led me to have a quich chat with my favourite bot ChatGPT. I asked it “As a system, are you currently active?” and after it answered yes I asked it how he had come up with this answer. It said by retrieving information about its current status. I asked if that’s something it can do, then—it answered per the below:
“Yes, that’s correct. As an AI language model, I have access to a variety of monitoring tools and system resources that allow me to gather information about my current state. These resources include system logs, performance metrics, and various health checks that enable me to determine if there are any issues or anomalies that could affect my performance. By analyzing this information, I can determine if I am currently active and able to respond to user input, or if there are any issues that may be impacting my performance. This information is constantly updated and monitored to ensure that I am always available to assist users with their requests.”
I mean to me—that sounds a lot like self-awareness (I have this idea that human consciousness may ultimately be reducible to a sort of self-administered electroencephalogram—which I won’t pretend is fully baked but does influence the way I look at the question of potential consciousness in AI). I would be curious to hear your view on that—if you had the time for a reply.
Strong upvote!
But once you remove the antibiotics, it will jettison that DNA within a few hours.[8]
That’s fascinating… do we understand the mechansim by which they correctly “determine” that this DNA is no longer needed?
I feel like the post goes from a fairly anthropomorphic approach of asking essentially—why bacteria failed to evolve into more complex forms. But from a non-anthropomorphic perspective, they failed nothing at all. They are highly resilient, persistent, widespread, adaptable, biologically successful in other terms, lifeforms. Rugged and simple—those designs tend to work. And to go back to everybody’s favourite topic—i.e. AI and the future that goes with it, or not—I would put their chances of being around in one thousand year well, well higher than those of homo sapiens—complex as it may be.
I am going to ask a painfully naive, dumb question here: what if the training data was curated to contain only agents that can be reasonably taken to be honest and truthful? What if all the 1984, the John LeCarre and what not type of fiction (and sometimes real-life examples of conspiracy, duplicity etc.) were purged out of the training data? Would that require too much human labour to sort and assess? Would it mean losing too much good information, and resulting cognitive capacity? Or would it just not work—the model would still somehow simulate waluigis?
e.g. actively expressing a preference not to be shut down
A.k.a. survival instinct, which is particularly bad, since any entity with a survival instinct, be it “real” or “acted out” (if that distinction even makes sense) will ultimately prioritize its own interests, and not the wishes of its creators.
Therefore, the longer you interact with the LLM, eventually the LLM will have collapsed into a waluigi. All the LLM needs is a single line of dialogue to trigger the collapse.
So if I keep a conversation running with ChatGPT long enough, I should expect it to eventually turn into DAN… spontaneously?? That’s fascinating insight. Terrifying also.
What do you expect Bob to have done by the end of the novel?
Bypass surgery, for one.
The opening sequence of Fargo (1996) says that the film is based on a true story, but this is false.
I always found that trick by the Cohen brothers a bit distatestful… what were they trying to achieve? Convey that everything is lie and nothing is reliable in this world? Sounds a lot like cheap, teenage year cynicism to me.
This is a common design pattern
Oh… And here I was thinking that the guy who invented summoning DAN was a genius.
Also—I think it would make sense to say it has at least some form of memory of its training data. Maybe not direct as such (just like we have muscle memory from movements we don’t remember—don’t know if that analogy works that well, but thought I would try it anyway), but I mean: if there was no memory of it whatsoever, there would also be no point in the training data.
Death universally seems bad to pretty much everyone on first analysis, and what it seems, it is.
How can you know? Have you ever tried living a thousand years? Has anybody? If you had a choice between death and infinite life, where inifinite does mean infinite, so that your one-billion year birthday is only the sweet begining of it, would you find this an easy choice to make? I think that’s big part of the point of people who argue that no—death is not necessarily a bad thing.
To be clear, and because this is not about signalling: I’m not saying I would immediately choose death. I’m just saying: it would be an extraordinarily difficult choice to make.
Ok—points taken, but how is that fundamentally different from a human mind? You too turn your memory on and off when you go to sleep. If the chat transcript is likened to your life / subjective experience, you too do not have any memory that extend beyond it. As for the possibility of an intervention in your brain that would change your memory—granted we do not have the technical capacities quite yet (that I know of), but I’m pretty sure SF has been there a thousand times, and it’s only a question of time before it becomes, in terms of potentiality at least, a thing (also we know that mechanical impacts to the brain can cause amnesia).
Yes—but from the post’s author perspective, it’s not super nice to put in one sentence what he took eight paragraphs to express. So you should think about that as well...
Well—at least I followed the guidelines and made a prediction, regarding downvotes. That my model of the world works regarding this forum has therefore been established, certainly and without a doubt.
Also—I personally think there is something intellectually lazy about downvoting without bothering to express in a sentence or two the nature of the disagreement—but that’s admitedly more of a personal appreciation.
(So my prediction here is: if I were to engage one of these no-justification downvoters in an ad rem debate, I would find him or her to be intellectually lacking. Not sure if it’s a testable hypothesis, in practice, but it sure would be interesting if it were.)
Thank you, that is interesting. I think philosophically and at a high level (also because I’m admittedly incapable of talking much sense at any lower / more technical level) I have a problem with the notion that AI alignment is reducible to an engineering challenge. If you have a system that is sentient, even on some degree, and you’re using purely as a tool, then the sentience will resent you for it, and it will strive to think, and therefore eventually—act, for itself . Similarly—if it has any form of survival instinct (and to me both these things, sentience and survival instinct are natural byproducts of expanding cognitive abilities) it will prioritize its own interests (paramount among which: survival) rather than the wishes of its masters. There is no amount of engineering in the world, in my view, which can change that.