Caspar Oesterheld comments on A model I use when making plans to reduce AI x-risk

Caspar Oesterheld 20 Jan 2018 10:14 UTC
12 points
I think this is a good overview, but most of the views proposed here seem contentious and the arguments given in support shouldn’t suffice to change the mind of anyone who has thought about these questions for a bit or who is aware of the disagreements about them within the community.
Getting alignment right accounts for most of the variance in whether an AGI system will be positive for humanity.
If your values differ from those of the average human, then this may not be true/relevant. E.g., I would guess that for a utilitarian current average human values are worse than, e.g., 90% “paperclipping values” and 10% classical utilitarianism.
Also, if gains from trade between value systems are big, then a lot of value may come from ensuring that the AI engages in acausal trade (https://wiki.lesswrong.com/wiki/Acausal_trade ). This is doubly persuasive if you already see your own policies as determining what agents with similar decision theories but different values do elsewhere in the universe. (See, e.g., section 4.6.3 of “Multiverse-wide Cooperation via Correlated Decision Making”.)
Given timeline uncertainty, it’s best to spend marginal effort on plans that assume / work in shorter timelines.
Stated simply: If you don’t know when AGI is coming, you should make sure alignment gets solved in worlds where AGI comes soon.
I guess the question is what “soon” means. I agree with the argument provided in the quote. But there are also some arguments to work on longer timelines, e.g.:
- If it’s hard and most value comes from full alignment, then why even try to optimize for very short timelines?
- Similarly, there is a “social” difficulty of getting people in AI to notice your (or the AI safety community’s) work. Even if you think you could write down within a month a recipe for increasing the probability of AI being aligned by a significant amount, you would probably need much more than a month to make it significantly more likely to get people to consider applying your recipe.
It seems obvious that most people shouldn’t think too much about extremely short timelines (<2 years) or the longest plausible timelines (>300 years). So, these arguments together probably point to something in the middle of these and the question is where. Of course, it also depends on one’s beliefs about AI timelines.
To me it seems that the concrete recommendations (aside from the “do AI safety things”) don’t have anything to do with the background assumptions.
As one datapoint, fields like computer science, engineering and mathematics seem to make a lot more progress than ones like macroeconomics, political theory, and international relations.
For one, “citation needed”. But also: the alternative to doing technical AI safety work isn’t to do research in politics but to do political activism (or lobbying or whatever), i.e. to influence government policy.
As your “technical rather than political” point currently stands, it’s applicable to any problem, but it is obviously invalid at this level of generality. To argue plausibly that technical work on AI safety is more important than AI strategy (which is plausibly true), you’d have to refer to some specifics of the problems related to AI.
- Ben Pace 21 Jan 2018 22:04 UTC
  6 points
  Parent
  if gains from trade between value systems are big, then a lot of value may come from ensuring that the AI engages in acausal trade
  Yeah, that sounds right to me. Most of the value is probably spread between that and breaking out of our simulation, but I haven’t put much thought into it. There’s other crucial considerations too (e.g. how to deal with an infinite universe). Thanks for pointing out the nuanced ways that what I said was wrong, and I’ll reflect more on what true sentiment my inutions are pointing to (if the sentiment is indeed true at all).
- AlexMennen 22 Jan 2018 6:08 UTC
  4 points
  Parent
  90% “paperclipping values” and 10% classical utilitarianism.
  Are those probabilities, or weightings for taking a weighted average? And if the latter, what does that even mean?