Eliezer Yudkowsky is a research fellow of the Machine Intelligence Research Institute, which he co-founded in 2001. He is mainly concerned with the obstacles and importance of developing a Friendly AI, such as a reflective decision theory that would lay a foundation for describing fully recursive self modifying agents that retain stable preferences while rewriting their source code. He also co-founded Less Wrong, writing the Sequences, long sequences of posts dealing with epistemology, AGI, metaethics, rationality and so on.
He has published several articles, including:
“Cognitive Biases Potentially Affecting Judgment of Global Risks” (2008): A pioneer compilation of cognitive biases – systematic deviations from rationality – influencing our judgment of global catastrophic risks. These are defined as “where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.” Examples include volcanic eruptions, pandemic infections, nuclear war, worldwide tyrannies, out-of-control scientific experiments, or cosmic hazards). Yudkowsky’s chapter specifically examines how cognitive biases impact thinking about global catastrophic risks.
“AI as a Positive and Negative Factor in Global Risk. (2008)”: Another chapter in the compilation “Global Catastrophic Risks”, it analyses possible philosophical and technical failures in the construction of a Friendly AI, which could lead to an Unfriendly AI posing a enormous global risk. He also discusses how a Friendly AI could help decrease some global risks discussed in the book. Finally, because a powerful Friendly AI could reduce global risk, he argues that researching Friendly AI is extremely important for the future of humanity.
“Creating Friendly AI”(2001): One of the first articles to address the challenges in designing the features and cognitive architecture required to produce a benevolent — “Friendly” — Artificial Intelligence . It also gives one of the first precise definitions of terms such as Friendly AI and Seed AI.
“Levels of Organization in General Intelligence” (2002): Analyzes AGI through its decomposition in five subsystems, successive levels of functional organization: Code, sensory modalities, concepts, thoughts, and deliberation. Also discusses some advantages artificial minds would have, such as the possibility of Recursive self-improvement.
“Coherent Extrapolated Volition”(2004): Presents the difficulties and possible solutions for incorporating friendliness into an AGI. It argues that making an AGI simply do what we tell it to could be dangerous, since we don’t know what we want. Instead we should program the AGI to do what we want, predicting what the vectorial sum of an idealized version of what humanity would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. He calls this the coherent extrapolated volition of humankind, or CEV.
“Timeless Decision Theory” (2010): Describes Timeless decision theory, ”an extension of causal decision networks that compactly represents uncertainty about correlated computational processes and represents the decision maker as such a process”. It solves many problems which Causal Decision Theory or Evidential Decision Theory don’t have a plausible solution: Newcomb’s problem, Solomon’s Problems and Prisoner’s dilemma.
“Complex Value Systems are Required to Realize Valuable Futures” (2011): Discusses the Complexity of value: we can’t come up with a simple rule or description that sums up all human values. It analyzes how this problem makes it difficult to build a valuable future.