I think it’s about a 0.75 probability, conditional upon smarter-than-human AI being developed. Guess I’m kind of an optimist. TL;DR I don’t think it will be very difficult to impart your intentions into a sufficiently advanced machine.
UmamiSalami
I haven’t seen any parts of Givewell’s analyses that involve looking for the right buzzwords. Of course, it’s possible that certain buzzwords subconsciously manipulate people at Givewell in certain ways, but the same can be said for any group, because every group has some sort of values.
Why do you expect that to be true?
Because they generally emphasize these values and practices when others don’t, and because they are part of a common tribe.
How strongly? (“Ceteris paribus” could be consistent with an extremely weak effect.) Under what criterion for classifying people as EAs or non-EAs?
Somewhat weakly, but not extremely weakly. Obviously there is no single clear criteria, it’s just about people’s philosophical values and individual commitment. At most, I think that being a solid EA is about as important as having a couple additional years of relevant experience or schooling.
I do think that if you had a research-focused organization where everyone was an EA, it would be better to hire outsiders at the margin, because of the problems associated with homogeneity. (This wouldn’t the case for community-focused organizations.) I guess it just depends on where they are right now, which I’m not too sure about. If you’re only going to have 1 person doing the work, e.g. with an EA fund, then it’s better for it to be done by an EA.
I bet that most of the people who donated to Givewell’s top charities were, for all intents and purposes, assuming their effectiveness in the first place. From the donor end, there were assumptions being made either way (and there must be; it’s impractical to do all kinds of evaluation on one’s own).
I think EA is something very distinct in itself. I do think that, ceteris paribus, it would be better to have a fund run by an EA than a fund not run by an EA. Firstly, I have a greater expectation for EAs to trust each other, engage in moral trades, be rational and charitable about each other’s points of view, and maintain civil and constructive dialogue than I do for other people. And secondly, EA simply has the right values. It’s a good culture to spread, which involves more individual responsibility and more philosophical clarity. Right now it’s embryonic enough that everything is tied closely together. I tentatively agree that that is not desirable. But ideally, growth of thoroughly EA institutions should lead to specialization and independence. This will lead to a much more interesting ecosystem than if the intellectual work is largely outsourced.
It seems to me that Givewell has already acknowledged perfectly well that VillageReach is not a top effective charity. It also seems to me that there’s lots of reasons one might take GiveWell’s recommendations seriously, and that getting “particularly horrified” about their decision not to research exactly how much impact their wrong choice didn’t have is a rather poor way to conduct any sort of inquiry on the accuracy of organizations’ decisions.
In fact, it seems to me that the less intelligent an organism is, the easier its behavior can be approximated with model that has a utility function!
Only because those organisms have fewer behaviors in general. If you put a human in an environment where its options and sensory inputs were as simple as those experienced by apes and cats, humans would probably look like equally simple utility maximizers.
Kantian ethics: do not violate the categorical imperative. It’s derived logically from the status of humans as rational autonomous moral agents. It leads to a society where people’s rights and interests are respected.
Utilitarianism: maximize utility. It’s derived logically from the goodness of pleasure and the badness of pain. It leads to a society where people suffer little and are very happy.
Virtue ethics: be a virtuous person. It’s derived logically from the nature of the human being. It leads to a society where people act in accordance with moral ideals.
Etc.
pigs strike a balance between the lower suffering, higher ecological impact of beef and the higher suffering, lower ecological impact of chicken.
This was my thinking for coming to the same conclusion. But I am not confident in it. Just because something minimaxes between two criteria doesn’t mean that it minimizes overall expected harm.
All of the architectures assumed by people who promote these scenarios have a core set of fundamental weaknesses (spelled out in my 2014 AAAI Spring Symposium paper).
The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”. And if you program an explicit definition of ‘happiness’ into a machine, its definition of what it wants—human happiness—is not going to change no matter how competent it becomes. And there is no reason to expect that increases in competency lead to changes in values. Sure, it might be pretty easy to teach it the difference between actual human happiness and smiley faces, but it’s a simplified example to demonstrate a broader point. You can rephrase it as “fulfill the intentions of programmers”, but then you just kick things back a level with what you mean by “intentions”, another concept which can be hacked, and so on.
Your argument for “swarm relaxation intelligence” is strange, as there is only one example of intelligence evolving to approximate the format you describe (not seven billion—human brains are conditionally dependent, obviously), and it’s not even clear that human intelligence isn’t equally well described as goal directed agency which optimizes for a premodern environment. The arguments in Basic AI Drives and other places don’t say anything about how AI will be engineered, so they don’t say anything about whether they’re driven by logic, just about how it will behave, and all sorts of agents behave in generally logical ways without having explicit functions to do so. You can optimize without having any particular arrangement of machinery (humans do as well).
Anyway, in the future when making claims like this, it would be helpful to make it clear early on that you’re not really responding to the arguments that AI safety research relies upon—you’re responding to an alleged set of responses to the particular responses that you have given to AI safety research.
That is why I said what I said. We discussed it at the 2014 Symposium If I recall correctly Steve used that strategy (although to be fair I do not know how long he stuck it out). I know for sure that Daniel Dewey used the Resort-to-RL maneuver, because that was the last thing he was saying as I had to leave the meeting.
So you had two conversations. I suppose I’m just not convinced that there is an issue here: I think most people would probably reject the claims in your paper in the first place, rather than accepting them and trying a different route.
I came here to write exactly what gjm said, and your response is only to repeat the assertion “Scenarios in which the AI Danger comes from an AGI that is assumed to be an RL system are so ubiquitous that it is almost impossible to find a scenario that does not, when push comes to shove, make that assumption.”
What? What about all the scenarios in IEM or Superintelligence? Omohundro’s paper on instrumental drives? I can’t think of anything which even mentions RL, and I can’t see how any of it relies upon such an assumption.
So you’re alleging that deep down people are implicitly assuming RL even though they don’t say it, but I don’t see why they would need to do this for their claims to work nor have I seen any examples of it.
In Bostrom’s dissertation he says it’s not clear if number of observers or the number of observer-moments is the appropriate reference class for anthropic reasoning.
I don’t see how you are jumping to the fourth disjunct though. Like, maybe they run lots of simulations which are very short? But surely they would run enough to outweigh humanity’s real history whichever way you measure it. Assuming they have posthuman levels of computational power.
In other words, a decision theory, complete with an algorithm (so you can actually use it), and a full set of terminal goals. Not what anyone else means by “moral theory’.
When people talk about moral theories they refer to systems which describe the way that one ought to act or the type of person that one ought to be. Sure, some moral theories can be called “a decision theory, complete with an algorithm (so you can actually use it), and a full set of terminal goals,” but I don’t see how that changes anything about the definition of a moral theory.
To say that you may chose any one of two actions when it doesn’t matter which one you chose since they have the same value, isn’t to give “no guidance”.
Proves my point. That’s no different from how most most moral theories respond to questions like “which shirt do I wear”. So this ‘completeness criterion’ has to be made so weak as to be uninteresting.
Among hedonistic utilitarians it’s quite normal to demand both completeness
Utilitarianism provides no guidance on many decisions: any decision where both actions produce the same utility.
Even if it is a complete theory, I don’t think that completeness is demanded of the theory; rather it’s merely a tenet of it. I can’t think of any good a priori reasons to expect a theory to be complete in the first place.
The question needs to cover how one should act in all situations, simply because we want to answer the question. Otherwise we’re left without guidance and with uncertainty.
Well first, we normally don’t think of questions like which clothes to wear as being moral. Secondly, we’re not left without guidance when morality leaves these issues alone: we have pragmatic reasons, for instance. Thirdly, we will always have to deal with uncertainty due to empirical uncertainty, so it must be acceptable anyway.
There is one additional issue I would like to highlight, an issue which rarely is mentioned or discussed. Commonly, normative ethics only concerns itself with human actions. The subspecies homo sapiens sapiens has understandably had a special place in philosophical discussions, but the question is not inherently only about one subspecies in the universe. The completeness criterion covers all situations in which somebody should perform an action, even if this “somebody” isn’t a human being. Human successors, alien life in other solar systems, and other species on Earth shouldn’t be arbitrarily excluded.
I’d agree, but accounts of normativity which are mind- or society-dependent, such as constructivism would have reason to make accounts of ethics for humanity different from accounts of ethics for nonhumans.
It seems like an impossible task for any moral theory based on virtue or deontology to ever be able to fulfil the criteria of completeness and consistency
I’m not sure I agree there. Usually these theories don’t because the people who construct them disagree with some of the criteria, especially #1. But it doesn’t seem difficult to make a complete and demanding form of virtue ethics or deontology.
See Omohundro’s paper on convergent instrumental drives
It seems like hedging is the sort of thing which tends to make the writer sound more educated and intelligent, if possibly more pretentious.
It’s unjustified in the same way that vilalism was an unjustified explanation of life: it’s purely a product of our ignorance.
It’s not. Suppose that the ignorance went away: a complete physical explanation of each of our qualia—“the redness of red comes from these neurons in this part of the brain, the sound of birds flapping their wings is determined by the structure of electric signals in this region,” and so on—would do nothing to remove our intuitions about consciousness. But a complete mechanistic explanation of how organ systems work would (and did) remove the intuitions behind vitalism.
I disagree. You’ve said that epiphenominalists hold that having first-hand knowledge is not causally related to our conception and discussion of first-hand knowledge. This premise has no firm justification.
Well… that’s just what is implied by epiphenomenalism, so the justification for it is whatever reasons we have to believe epiphenomenalism in the first place. (Though most people who gravitate towards epiphenomenalism seem to do so out of the conviction that none of the alternatives work.)
Denying it yields my original argument of inconceivability via the p-zombie world.
As I’ve said already, your argument can’t show that zombies are inconceivable. It only attempts to show that an epiphenomenalist world is probabilistically implausible. These are very different things.
Accepting it requires multiplying entities unnecessarily, for if such knowledge is not causally efficacious
Well the purpose of rational inquiry is to determine which theories are true, not which theories have the fewest entities. Anyone who rejects solipsism is multiplying entities unnecessarily.
I previously asked for any example of knowledge that was not a permutation of properties previously observed.
I don’t see why this should matter for the zombie argument or for epiphenomenalism. In the post where you originally asked this, you were confused about the contextual usage and meaning behind the term ‘knowledge.’
1%? Shouldn’t your basic uncertainty over models and paradigms be great enough to increase that substantially?