GiveWell interview with major SIAI donor Jaan Tallinn
GiveWell recently release notes from their interview with Jaan Tallinn, Skype co-founder and a major SIAI donor, about SIAI (link). Holden Karnofsky says
[M]y key high-level takeaways are that
I appreciated Jaan’s thoughtfulness and willingness to engage in depth. It was an interesting exchange.
I continue to disagree with the way that SIAI is thinking about the “Friendliness” problem.
It seems to me that all the ways in which Jaan and I disagree on this topic have more to do with philosophy (how to quantify uncertainty; how to deal with conjunctions; how to act in consideration of low probabilities) and with social science-type intuitions (how would people likely use a particular sort of AI) than with computer science or programming (what properties has software usually had historically; which of these properties become incoherent/hard to imagine when applied to AGI)
- 29 Aug 2011 16:31 UTC; 13 points) 's comment on Why We Can’t Take Expected Value Estimates Literally (Even When They’re Unbiased) by (
- Is an Intelligence Explosion a Disjunctive or Conjunctive Event? by 14 Nov 2011 11:35 UTC; 9 points) (
- 17 Nov 2011 2:32 UTC; 8 points) 's comment on New AI risks research institute at Oxford University by (
I continue to be impressed by Holden’s thoughtfulness and rigor. If people want charity evaluators to start rating the effectiveness of x-risk-reducing organizations, then those organizations need to do a better job with (1) basic org effectiveness and transparency—publishing a strategic plan, whistleblower policy, and the stuff Charity Navigator expects—and with (2) making the case for the utility of x-risk reduction more clearly and thoroughly.
Luckily I am currently helping the Singularity Institute with both projects, and there are better reasons to do (1) and (2) than ‘looking good to charity evaluators’. That is a side benefit, though.
Rather than issues of philosophy or social science intuitions, I think the problems remain in the realm of concrete action… however, there’s too many assumptions left unstated from both sides to unpack them in that debate, and the focus became too narrow.
Channeling XiXiDu, someone really needs to create a formalization of the Friendly AI problem, so that these sorts of debates don’t continue along the same lines, where they talk past each other so often.
This bit (from Karnofsky):
...is probably not right. Nobody really knows how tough this problem will prove to be once we stop being able to crib from the human solution—and it is possible that progress will get tougher. However, much of the progress on the problem has not been obviously based on reverse-engineering the human prediction algorithm in the first place. Also machine prediction capabilities already far exceed human ones in some domains—e.g. chess, the weather.
Anyway, this problem makes little difference either way. Machines don’t have the human pelvis to contend with, and won’t be limited to running at 200 Hz.
These ideas might inform the exchange:
The point about hidden complexity of wishes applies fully to specifying the fact that needs to be predicted. Such a wish is still very hard to formalize.
If Oracle AI is used to construct complex designs, it needs to be more than a predictor, for the space of possible designs is too big, for example the designs need to be understandable by people who read them. (This is much less of a problem if the predictor just makes up a probability.) If it’s not just a predictor, it needs a clear enough specification of what parameters it’s optimizing its output for.
What does the AI predict, for each possible answer? It predicts the consequences of having produced a particular answer, and then it argmaxes over possible answers. In other words, it’s not a predictor at all, it’s a full-blown consequentialist agent.
A greedy/unethical person scenario is not relevant for two reasons: (1) it’s not apparent that an AI can be built that gives significant power, for the hidden complexity of wishes reasons, and (2) if someone has taken over the world, the problem is still the same: what’s next, and how to avoid destroying the world?
It’s not clear in what way powerful humans/narrow AI teams “make SIAI’s work moot”. Controlling the world doesn’t give insight about what to do with it, or guard from fatal mistakes.
I think Holden is making the point that the work SIAI is trying to do (i.e. sort out all the issues of how to make FAI) might be so much easier to do in the future with the help of advanced narrow AI that it’s not really worth investing a lot into trying to do it now.
Note: for anyone else who’d been wondering about Eliezer’s position on Oracle AI, see here.
...
A powerful machine couldn’t give a human “significant power”?!? Wouldn’t Page and Brin be counter-examples?
One problem with an unethical ruler is that they might trash some fraction in the world in the process of rising to power. For those who get trashed, what the ruler does afterwards may be a problem they are not around to worry about.
You mean you can’t think of scenarios where an Oracle prints out complex human-readable designs? How about you put the Oracle into a virtual world where it observes a plan to steal those kinds of design, and then ask it what it will observe next—as the stolen plans are about to be presented to it?
Holden seems to assume that GMAGI has access to predictive algorithms for all possible questions—this seems to me to be unlikely (say, 1% chance), compared to the possibility that it has the ability to write novel code for new problems. If it writes novel code and runs it, it must have some algorithm for how that code is written and what resources are used to implement it—limiting that seems like the domain of SIAI research.
Holden explicitly states:
i.e., that he believes that all novel questions will have algorithms already implemented, which seems to me to be clearly his weakest assumption, if he is assuming that GMAGI is non-narrow.
I thought the whole danger of a black box scenario is that humans may be unable to successfully screen unsafe improvements?
These seem like serious weaknesses in his PoV to me.
However his points that
a) a narrow philosophy-AI could outdo SIAI in the arena of FAI (in which case identifying the problem of FAI is 99% of SIAI’s value and is already accomplished) b) FAI research may not be used for whatever reason by whatever teams DO develop AGI c) Something Weird Happens
Seem like very strong points diminishing the value of SIAI.
I can sympathize with his position that as an advocate of efficient charity he should be focused on promoting actions of charitable actors which he has a high level of certainty will be significantly more efficient, and that maintaining the mindset in himself that highly reliable charities are preferable to highly valuable charities helps him fulfill the social role he is in. That is, he should be very averse to a scenario in which he recommends a charity which turns out to be less efficient than charities he is recommending it over. The value of SIAI does not seem to me to be totally overdetermined.
In conclusion, I have some updating to do, but I don’t know in which direction it is. And I absolutely love reading serious well thought out conversations by intelligent people about important subjects.