Situational Awareness

TagLast edit: Jun 16, 2023, 2:42 PM by Mateusz Bagiński

Ajeya Cotra uses the term “situational awareness” to refer to a cluster of skills including “being able to refer to and make predictions about yourself as distinct from the rest of the world,” “understanding the forces out in the world that shaped you and how the things that happen to you continue to be influenced by outside forces,” “understanding your position in the world relative to other actors who may have power over you,” “understanding how your actions can affect the outside world including other actors,” etc.

Alternatively, from an ML-perspective, situational awareness can be characterized as a strong form of out-of-context meta-learning applied to situationally-relevant statements.

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

Ajeya CotraJul 18, 2022, 7:06 PM

368 points

95 comments75 min readLW link 1 review

Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs

Michaël TrazziAug 24, 2024, 4:30 AM

55 points

0 comments5 min readLW link

Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs

Sanyu RajakumarMar 12, 2025, 5:56 PM

7 points

0 comments13 min readLW link

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, Asa Cooper Stickland, Meg and Maximilian Kaufmann

Sep 4, 2023, 12:54 PM

109 points

16 comments5 min readLW link

(arxiv.org)

Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”

Miles TurpinOct 3, 2023, 2:22 AM

31 points

0 comments9 min readLW link

[Question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

David Scott Krueger (formerly: capybaralet)Sep 4, 2024, 12:40 PM

19 points

7 comments1 min readLW link

Investigating the Ability of LLMs to Recognize Their Own Writing

Christopher Ackerman and Nina Panickssery

Jul 30, 2024, 3:41 PM

32 points

0 comments15 min readLW link

Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs

L Rudolf L, bilalchughtai, Jan Betley, kaivu, Jérémy Scheurer, Mikita Balesni, AlexMeinke, Owain_Evans and Marius Hobbhahn

Jul 8, 2024, 10:24 PM

109 points

37 comments5 min readLW link

LM Situational Awareness, Evaluation Proposal: Violating Imitation

Jacob PfauApr 26, 2023, 10:53 PM

16 points

2 comments2 min readLW link

Contingency: A Conceptual Tool from Evolutionary Biology for Alignment

clem_acsJun 12, 2023, 8:54 PM

57 points

2 comments14 min readLW link

(acsresearch.org)

The intelligence-sentience orthogonality thesis

Ben SmithJul 13, 2023, 6:55 AM

19 points

9 comments9 min readLW link

The Zeroth Skillset

katydeeJan 30, 2013, 12:46 PM

74 points

109 comments2 min readLW link

Revealing Intentionality In Language Models Through AdaVAE Guided Sampling

jdpOct 20, 2023, 7:32 AM

119 points

15 comments22 min readLW link

Facts vs Interpretations

Declan MolonyFeb 27, 2024, 7:57 AM

15 points

1 comment3 min readLW link

Emergent Misalignment and Emergent Alignment

Alvin ÅnestrandApr 3, 2025, 8:04 AM

2 points

0 comments8 min readLW link

Do models know when they are being evaluated?

Govind Pimpale, Giles, Joe Needham and Marius Hobbhahn

Feb 17, 2025, 11:13 PM

54 points

3 comments12 min readLW link

Perceptual Blindspots: How to Increase Self-Awareness

Declan MolonyMar 26, 2024, 5:37 AM

14 points

3 comments2 min readLW link

LLM Evaluators Recognize and Favor Their Own Generations

Arjun Panickssery, Sam Bowman and Shi Feng

Apr 17, 2024, 9:09 PM

44 points

1 comment3 min readLW link

(tiny.cc)

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data

Sohaib ImranNov 16, 2024, 11:22 PM

36 points

11 comments14 min readLW link

Early situational awareness and its implications, a story

Jacob PfauFeb 6, 2023, 8:45 PM

29 points

6 comments3 min readLW link

Situational awareness in Large Language Models

Simon MöllerMar 3, 2023, 6:59 PM

31 points

2 comments7 min readLW link

Refining the Sharp Left Turn threat model, part 2: applying alignment techniques

Vika, Vikrant Varma, Ramana Kumar and Rohin Shah

Nov 25, 2022, 2:36 PM

39 points

9 comments6 min readLW link

(vkrakovna.wordpress.com)

No comments.