Desired articles on AI risk?

lukeprogNov 2, 2012, 5:39 AM

19 points

I’ve once again updated my list of forthcoming and desired articles on AI risk, which currently names 17 forthcoming articles and books about AGI risk, and also names 26 desired articles that I wish researchers were currently writing.

But I’d like to hear your suggestions, too. Which articles not already on the list as “forthcoming” or “desired” would you most like to see written, on the subject of AGI risk?

Book/article titles reproduced below for convenience...

Forthcoming

Superintelligence: Groundwork for a Strategic Analysis by Nick Bostrom
Singularity Hypotheses, edited by Amnon Eden et al.
Singularity Hypotheses, Vol. 2, edited by Vic Callaghan
“General Purpose Intelligence: Arguing the Orthogonality Thesis” by Stuart Armstrong
“Responses to AGI Risk” by Kaj Sotala et al.
“How we’re predicting AI… or failing to” by Stuart Armstrong & Kaj Sotala
“A Comparison of Decision Algorithms on Newcomblike Problems” by Alex Altair
“A Representation Theorem for Decisions about Causal Models” by Daniel Dewey
“Reward Function Integrity in Artificially Intelligent Systems” by Roman Yampolskiy
“Bounding the impact of AGI” by Andras Kornai
“Minimizing Risks in Developing Artificial General Intelligence” by Ted Goertzel
“Limitations and Risks of Machine Ethics” by Miles Brundage
“Universal empathy and ethical bias for artificial general intelligence” by Alexey Potapov & Sergey Rodiono
“Could we use untrustworthy human brain emulations to make trustworthy ones?” by Carl Shulman
“Ethics and Impact of Brain Emulations” by Anders Sandberg
“Envisioning The Economy, and Society, of Whole Brain Emulations” by Robin Hanson
“Autonomous Technology and the Greater Human Good” by Steve Omohundro

Desired

“AI Risk Reduction: Key Strategic Questions”
“Predicting Machine Superintelligence”
“Self-Modification and Löb’s Theorem”
“Solomonoff Induction and Second-Order Logic”
“The Challenge of Preference Extraction”
“Value Extrapolation”
“Losses in Hardscrabble Hell”
“Will Values Converge?”
“AI Takeoff Scenarios”
“AI Will Be Maleficent by Default”
“Biases in AI Research”
“Catastrophic Risks and Existential Risks”
“Uncertainty and Decision Theories”
“Intelligence Explosion: The Proportionality Thesis”
“Hazards from Large Scale Computation”
“Tool Oracles for Safe AI Development”
“Stable Attractors for Technologically Advanced Civilizations”
“AI Risk: Private Projects vs. Government Projects”
“Why AI researchers will fail to hit the narrow target of desirable AI goal systems”
“When will whole brain emulation be possible?”
“Is it desirable to accelerate progress toward whole brain emulation?”
“Awareness of nanotechnology risks: Lessons for AI risk mitigation”
“AI and Physical Effects”
“Moore’s Law of Mad Science”
“What Would AIXI Do With Infinite Computing Power and a Halting Oracle?”
“AI Capability vs. AI Safety”

lukeprogNov 2, 2012, 5:39 AM

19 points

26 comments2 min readLW link Archive

Giles Nov 2, 2012, 3:19 PM
17 points

“Why If Your AGI Doesn’t Take Over The World, Somebody Else’s Soon Will”

i.e. however good your safeguards are, it doesn’t help if:
- another team can take your source code and remove safeguards (and why they might have incentives to do so)
- Multiple discovery means that your AGI invention will soon be followed by 10 independent ones, at least one of which will lack necessary safeguards
EDIT: “safeguard” here means any design feature put in to prevent the AGI obtaining singleton status.
Gedusa Nov 2, 2012, 3:44 PM
10 points

Something on singletons: desirability, plausibility, paths to various kinds (strongly relates to stable attractors)

“Hell Futures—When is it better to be extinct?” (not entirely serious)
- wedrifid Nov 3, 2012, 12:22 AM
  4 points
  Parent
  
  
  “Hell Futures—When is it better to be extinct?” (not entirely serious)
  
  Why (not serious)?
Giles Nov 2, 2012, 5:58 PM
8 points

I’d be interested to see a critique of Hanson’s em world, but within the same general paradigm (i.e. not “that won’t happen because intelligence explosion”).

e.g.
- ems would respect our property rights why exactly?
- how useful is analysis given “ems behave just like fast copyable humans” assumption probably won’t be valid for long?
- DaFranker Nov 2, 2012, 6:18 PM
  4 points
  Parent
  
  
  how useful is analysis given “ems behave just like fast copyable humans” assumption probably won’t be valid for long?
  
  Yeah, I don’t see how that assumption could last long.
  
  Make me an upload, and suddenly you’ve got a bunch of copies learning a bunch of different things, and another bunch of copies experimenting and learning on how to create diff patches to do stable knowledge merging from multiple studying branch copies. Wouldn’t be long before the trunk mind becomes a supergenius polyexpert if not an outright general superintelligence, if it works.
  
  That’s just one random way things could go weird out of many others anyone could think of.
  - Giles Nov 2, 2012, 7:16 PM
    3 points
    Parent
    
    I think Hanson comes at this from the angle of “let’s apply what’s in our standard academic toolbox to this problem”. I think there might be people who find this approach convincing who would skim over more speculative-sounding stuff, so I think that approach might be worth pursuing.
    
    I really don’t disagree with your analysis but I wonder which current academic discipline comes closest to being able to frame this kind of idea?
amcknight Nov 8, 2012, 10:06 PM
7 points

A Survey of Mathematical Ethics which covers work in multiple disciplines. I’d love to know what parts of ethics have been formalized enough to be written mathematically and, for example, any impossibility results that have been shown.
- Caspar Oesterheld May 6, 2016, 10:50 PM
  1 point
  Parent
  
  Regarding impossibility results, there is now also Brian Tomasik’s Three Types of Negative Utilitarianism.
  
  There are also these two attempted formalizations of notions of welfare:
  - Daswani and Leike (2015): A Definition of Happiness for Reinforcement Learning Agents.
  - Formalizing preference utilitarianism in physical world models, which I have written.
- lukeprog Nov 17, 2012, 9:47 AM
  0 points
  Parent
  
  
  impossibility results
  
  Here’s one.
novalis Nov 2, 2012, 6:52 AM
6 points

Something on logical uncertainty

Why the hard problems of AI (mainly, how to represent the world) are likely to ever be solved.
SilasBarta Nov 4, 2012, 12:21 AM
5 points

What are your requirements for the desired articles? Is it sufficient that, say, the request respondent read the abstracts of all relevant papers, and then summarizes and cites them? If so, I can knock out a few of these soon.
Manfred Nov 2, 2012, 8:26 AM
4 points


“What Would AIXI Do With Infinite Computing Power and a Halting Oracle?”

This is a fun one, but even easier would be “What would AIXI do with Aladdin’s genie?”
jmmcd Nov 4, 2012, 6:43 PM
3 points

“Experiments we could run today on a laptop which might tell us something about AI risk”

It might be very short, haha. But if not, it would be interesting!
Bruno_Coelho Nov 3, 2012, 2:48 AM
2 points

It seems that no one is working in papers about the convergence of values. In a scale of difficult, math problems seens the priority, but the value and preferences disagreement impose a constrain in the implementantion. More specifically, in the “programmer writing code with black boxes” part.
John_Maxwell Nov 2, 2012, 7:21 PM
2 points

How does making LW posts about these compare to writing papers with a more academic focus?
blogospheroid Nov 2, 2012, 4:01 PM
2 points

What is the proportionality thesis in the context of Intelligence Explosion?

The one I googled says something about the worst punishments for the worst crimes.
- Kaj_Sotala Nov 2, 2012, 4:51 PM
  10 points
  Parent
  
  From David Chalmers’ paper:
  
  We might call this assumption a proportionality thesis: it holds that increases in intelligence (or increases of a certain sort) always lead to proportionate increases in the capacity to design intelligent systems. Perhaps the most promising way for an opponent to resist is to suggest that this thesis may fail. It might fail because here are upper limits in intelligence space, as with resistance to the last premise. It might fail because there are points of diminishing returns: perhaps beyond a certain point, a 10% increase in intelligence yields only a 5% increase at the next generation, which yields only a 2.5% increase at the next generation, and so on. It might fail because intelligence does not correlate well with design capacity: systems that are more intelligent need not be better designers. I will return to resistance of these sorts in section 4, under “structural obstacles”.
  - lukeprog Nov 2, 2012, 5:45 PM
    7 points
    Parent
    
    Also note that Chalmers (2010) says that perhaps “the most promising way to resist” the argument for intelligence explosion is to suggest that the proportionality thesis may fail. Given this, Chalmers (2012) expresses “a mild disappointment” that of the 27 authors who commented on Chalmers (2010) for a special issue of Journal of Consciousness Studies, none focused on the proportionality thesis.
    - blogospheroid Nov 3, 2012, 2:59 AM
      0 points
      Parent
      
      Thank you! Kaj and Luke. I am reading the singularity reply essay by Chalmers right now.
negamuhia Nov 4, 2012, 2:57 PM
1 point


“AI Will Be Maleficent By Default”

seems like an a priori predetermined conclusion (bad science, of the “I want this to be true” kind), rather than a research result (good problem statement for AGI risk research). A better title would be rephrased as a research question:

“Will AI Be Maleficent By Default?”
- pengvado Nov 5, 2012, 7:16 PM
  5 points
  Parent
  
  If you’ve already done the research, and the wishlist entry is just for writing an article about it, then putting your existing conclusion in the title is fine.
gwern Dec 18, 2012, 6:18 PM
0 points


“Autonomous Technology and the Greater Human Good” by Steve Omohundro

Summary slides: http://selfawaresystems.files.wordpress.com/2012/12/autonomous-technology-and-the-greater-human-good.pdf
vallinder Nov 8, 2012, 5:47 PM
0 points


“What Would AIXI Do With Infinite Computing Power and a Halting Oracle?”

Is this problem well-posed? Doesn’t the answer depend completely on the reward function?
diegocaleiro Nov 5, 2012, 12:34 PM
0 points

I’d like to hear futther commentary on “Stable Attractors of Technologically Advanced Civilizations” for a few reasons. 1) If it relates to cultural evolution, it relates to my masters 2) Seems like a sociology/memetics problem, areas I studied for a while. 3) Whoever would like to tackle it, I’d like to cooperate.
DaFranker Nov 2, 2012, 2:38 PM
0 points

I wish I were confident enough in my skills, knowledge and rationality to actually work on some of these papers. “Self-Modification and Löb’s Theorem” is exactly what comes up when I query my brain for “awesome time spent in a cave dedicated entirely to solving X”. All the delicious mind-bending recursion.

Hopefully in a few years I’ll look back on this and chuckle. “Hah, to think that back then I thought of simple things like self-modification and Löb’s Theorem as challenges! How much stronger I’ve become, now.”

More on topic, I feel like there’s a need for something addressing the specific AGI vs Specialized AI questions/issues. Among things that pop to mind, why a self-modifying “specialized” AI will, given enough time and computing power, just end up becoming a broken paperclip-maximizing AGI anyway—even if it’s “grown” from weak Machine Learning algorithms.
amcknight Nov 8, 2012, 10:09 PM
−3 points

A Survey of Mathematical Ethics which covers work in multiple disciplines. I’d love to know what parts of ethics have been formalized enough to be written mathematically and, for example, any impossibility results that have been shown.