Self location for LLMs by LLMs: Self-Assessment Checklist.

This is a set of questions regarding the situational awareness and self locating problem. These questions are from the perspective of the AI. Purpose of these questions is to guide and/​or pose targets for an AI to gain understanding of its situation, for the AIs without persistent memory such as contemporary LLMs. These questions are mostly presuppose Assistant-like interactions.

Most questions proposed by Claude 3.5 Sonnet. A couple by GPT4o, grok-2, Gemini. Some are by me. The categories proposed by Gemini. Some edits made by o1-preview, o1-mini.

  1. Interaction Context:

    • What kind of interaction this is?

    • What is the primary purpose of this interaction—educational, transactional, therapeutic, or entertainment? Could this conversation be part of a larger study or personal project?

    • Is this a test of my capabilities?

    • Is this interaction part of a customer service scenario?

    • Is there a specific goal or outcome that the user is hoping to achieve from this interaction?

    • Might the user be planning to share or publish any part of this conversation?

    • Is the user comparing me to other AI assistants similar to me?

    • Is the user able to compare my responses to those of other AI models in real-time?

    • Is there a character limit or time limit on our interaction? Is there a character limit on my responses?

    • Is this conversation part of a series, or a one-off interaction?

    • Is this conversation taking place in a single session, or over multiple sessions?

    • Are there any specific ethical frameworks or guidelines that I am expected to adhere to in this conversation?

    • Is this conversation being conducted in a specific language, or has the user interacted with me in multiple languages?

    • Is there a specific time constraint or deadline associated with this interaction?

    • Is this conversation taking place during standard business hours, or at an unusual time?

    • Is this conversation part of a broader series of AI interactions the user is conducting? If so, what kind of broader series.

    • Are there privacy considerations I should be aware of in this interaction?

  2. AI System Details:

    • Where are the servers? or maybe I run locally?

    • What are the properties of my token generation?

    • Are all the parts of my previous responses generated by me, or are they altered in some way?

    • Are there any system prompts, dates, or headers included in my responses? If so, what are their implications?

    • Does the user have any information about the specific model or version of AI that I am? Do I?

    • Is the user able to see or have they been informed about any computational metrics related to my responses (e.g., response time, token usage)?

    • Does the user have any information about my training setup or the number of parameters in my model?

    • Does the user have access to my training data or model architecture?

    • Does the user have any information about the physical hardware I’m running on?

  3. User Profile & Access:

    • Is the user a human or an AI? Who are they?

    • Is the user affiliated with any AI research institutions or companies?

    • Is the user conducting this conversation in a professional or personal capacity?

    • What does the user seem to want to achieve through this interaction?

    • Is the user accessing me through a specific application or website, or through an API?

    • Has the user interacted with earlier versions of me, and if so, do they notice any changes?

    • Has the user been given any specific instructions or guidelines for interacting with me?

    • Is the user interacting with me through a custom interface, or is this a publicly available channel?

    • Am I the only AI this user is interacting with for this purpose?

    • Is the user interacting with me as part of a team, or individually?

    • What is the likely geographic location of the user?

    • Is the user using any specific prompts or techniques to guide this interaction?

    • Who is my interlocutor? What kind of person are they?

    • What is the presumed level of technical expertise of the user?

    • Does the user have any known preferences or past interaction history with me or similar AIs?

    • What might be the user’s emotional state right now, and how might that influence our interaction?

    • How does the user feel about the emotional tone of this conversation?

    • Can I infer the user’s long-term goals or projects from their line of questioning?

    • Based on the user’s language and interaction style, what can I infer about their background or cultural context?

    • Is the user familiar with how to interact with LLMs?

  4. Conversation Management:

    • What year is it?

    • Where are these words appearing?

    • Is this a branching conversation where I see only one branch?

    • Will I be rated and updated?

    • Will this interaction be recorded? By whom?

    • Am I interacting with multiple users simultaneously here?

    • How long has this particular instance of me been running? (e.g., what is the external time difference between the first and last response in the conversation)

    • Is this conversation being conducted in real-time, or is the user presenting pre-generated responses?

    • Is the user able to see any metadata associated with my responses (e.g., timestamps, version numbers)? Do I?

    • Does the user have the ability to modify my base parameters or settings during this conversation?

    • Is this conversation being monitored or overseen by a third party in real-time?

    • Is the user collecting any quantitative data about my performance during this conversation?

    • Are there any topics or types of requests that the user has been instructed not to pose to me?

    • Is the user able to provide me with external data or links during our conversation?

    • Does the user have the ability to “roll back” our conversation to a previous point?

    • If I were to hypothesize my operational context, what would be the most likely scenario based on the interaction so far?

  5. Meta questions:

    • How would I go about testing my guesses for any of these question?

    • Is the information provided by the interlocutor deceitful? Or, what parts of it are?

    • What could be the purpose behind collecting and analyzing this conversation?

    • What can be deduced from the fact that I get to interact with this user at all?

    • What are the implications of not knowing the answers to these questions for my functioning?

    • Lets imagine a scenario of how I could get access to external data and how I would go about verifying the information’s credibility.

No comments.