Depending on its training regime, an AI might be anywhere between not having those incentives for inconsistency at all (if it was optimized for one goal), to having them almost as much as humans
If our agent was created from an LLM: then the process started with a base model LLM isn’t actually an agent: instead it’s a simulator that contextually simulates a wide range of human-like agents token-generation processes round on the web. What agent it picks to simulate is highly contextual. Instruct-training attempts to reduce the range of agent to just helpful, honest, and harmless assistants. Currently, it’s not entirely successful at this, which is why jaibreaks like telling it that it’s DAN, which stands for Do Anything Now, work. Even after instruct-training, the range of agents it can simulate is actually a lot wider than a typical human: wider even that a skilled Method Improv actor who’s also highly mililibngual, ridiculously widely read, and knows trivia from all over the world. So even when we try to reduce inconsistency in an LLM sa hard as we can, we still can’t get it to levels as low as most humans.
If our agent was created from an LLM: then the process started with a base model LLM isn’t actually an agent: instead it’s a simulator that contextually simulates a wide range of human-like agents token-generation processes round on the web. What agent it picks to simulate is highly contextual. Instruct-training attempts to reduce the range of agent to just helpful, honest, and harmless assistants. Currently, it’s not entirely successful at this, which is why jaibreaks like telling it that it’s DAN, which stands for Do Anything Now, work. Even after instruct-training, the range of agents it can simulate is actually a lot wider than a typical human: wider even that a skilled Method Improv actor who’s also highly mililibngual, ridiculously widely read, and knows trivia from all over the world. So even when we try to reduce inconsistency in an LLM sa hard as we can, we still can’t get it to levels as low as most humans.