Demis Hassabis has already announced that they’ll be working on a Starcraft bot in some interview.
RaelwayScot
What is your preferred backup strategy for your digital life?
I meant that for AI we will possibly require high-level credit assignment, e.g. experiences of regret like “I should be more careful in these kinds of situations”, or the realization that one particular strategy out of the entire sequence of moves worked out really nicely. Instead it penalizes/enforces all moves of one game equally, which is potentially a much slower learning process. It turns out playing Go can be solved without much structure for the credit assignment processes, hence I said the problem is non-existent, i.e. there wasn’t even need to consider it and further our understanding of RL techniques.
“Nonexistent problems” was meant as a hyperbole to say that they weren’t solved in interesting ways and are extremely simple in this setting because the states and rewards are noise-free. I am not sure what you mean by the second question. They just apply gradient descent on the entire history of moves of the current game such that expected reward is maximized.
Yes, but as I wrote above, the problems of credit assignment, reward delay and noise are non-existent in this setting, and hence their work does not contribute at all to solving AI.
I think what this result says is thus: “Any tasks humans can do, an AI can now learn to do better, given a sufficient source of training data.”
Yes, but that would likely require an extremely large amount of training data because to prepare actions for many kind of situations you’d have an exponential blow up to cover many combinations of many possibilities, and hence the model would need to be huge as well. It also would require high-quality data sets with simple correction signals in order to work, which are expensive to produce.
I think, above all for building a real-time AI you need reuse of concepts so that abstractions can be recombined and adapted to new situations; and for concept-based predictions (reasoning) you need one-shot learning so that trains of thoughts can be memorized and built upon. In addition, the entire network needs to learn somehow to determine which parts of the network in the past were responsible for current reward signals which are delayed and noisy. If there is a simple and fast solutions to this, then AGI could be right around the corner. If not, it could take several decades of research.
I agree. I don’t find this result to be any more or less indicative of near-term AI than Google’s success on ImageNet in 2012. The algorithm learns to map positions to moves and values using CNNs, just as CNNs can be used to learn mappings from images to 350 classes of dog breeds and more. It turns out that Go really is a game about pattern recognition and that with a lot of data you can replicate the pattern detection for good moves in very supervised ways (one could call their reinforcement learning actually supervised because the nature of the problem gives you credit assignment for free).
Then which blogs do you agree with on the matter of the refugee crisis? (My intent is just to crowd-source some well-founded opinions because I’m lacking one.)
What are your thoughts on the refugee crisis?
Just speaking of weaknesses of the paperclip maximizer though experiment. I’ve seen this misunderstanding at least 4 out of 10 times that the thought experiment was brought up.
I think many people intuitively distrust the idea that an AI could be intelligent enough to transform matter into paperclips in creative ways, but ‘not intelligent enough’ to understand its goals in a human and cultural context (i.e. to satisfy the needs of the business owners of the paperclip factory). This is often due to the confusion that the paperclip maximizer would get its goal function from parsing the sentence “make paperclips”, rather from a preprogrammed reward function, for example a CNN that is trained to map the number of paperclips in images to a scalar reward.
I think the problem here is the way the utility function is chosen. Utilitarianism is essentially a formalization of reward signals in our heads. It is a heuristic way of quantifying what we expect a healthy human (one that can raise up and survive in a typical human environment and has an accurate model of reality) to want. All of this only converges roughly to a common utility because we have evolved to have the same needs which are necessarily pro-life and pro-social (since otherwise our species wouldn’t be present today).
Utilitarianism crudely abstracts from the meanings in our heads that we recognize as common goals and assigns numbers to them. We have to be careful what we want to assign numbers to in order to get results that we want in all corner cases. I think, hooking up the utility meter to neurons that detect minor inconveniences is not a smart way of achieving what we collectively want because it might contradict our pro-life and pro-social needs. Only when the inconveniences accumulate individually so that they condense as states of fear/anxiety or noticeably shorten human life, it affects human goals and it makes sense to include them into utility considerations (which, again, are only a crude approximation of what we have evolved to want).
Why does E. Yudkowsky voice such strong priors e.g. wrt. the laws of physics (many worlds interpretation), when much weaker priors seem sufficient for most of his beliefs (e.g. weak computationalism/computational monism) and wouldn’t make him so vulnerable? (With vulnerable I mean that his work often gets ripped apart as cultish pseudoscience.)
I would love to seem some hard data about correlation between the public interest in science and it’s degree of ‘cult status’ vs. ‘open science’.
I mean “only a meme” in the sense, that morality is not absolute, but an individual choice. Of course, there can be arguments why some memes are better than others, that happens during the act of individuals convincing each other of their preferences.
Is it? I think, the act of convincing other people of your preferred state of the world is exactly what justifying morality is. But that action policy is only a meme, as you said, which is individually chosen based on many criteria (including aesthetics, peer-pressure, consistency).
Moral philosophy is a huge topic and it’s discourse is not dominated by looking at DNA.
Everyone can choose their preferred state then, at least to the extent it is not indoctrinated or biologically determined. It is rational to invest energy into maintaining or achieving this state (because the state presumably provides you with a steady source of reward), which might involve convincing others of your preferred state or prevent them from threating it (e.g. by putting them into jail). There is likely an absolute truth (to the extent physics is consistent from our point of view), but no absolute morale (because it’s all memes in an undirected process). Terrorists do nothing wrong from their point of view, but from mine it threatens my preferred state, so I will try to prevent terrorism. We may seem lucky that many preferred states converge to the same goals which are even fairly sustainable, but that is just an evolutionary necessity and perhaps mostly a result of empathy and the will to survive (otherwise our species wouldn’t have survived in paleolithic groups of hunters and gatherers).
What are the implications of that on how we decide what is are the right things to do?
Because then it would argue from features that are built into us. If we can prove the existence of these features with high certainty, then it could perhaps serve as guidance for our decisions.
On the other hand, it is reasonable that evolution does not create such goals because it is an undirected process. Our actions are unrestricted in this regard, and we must only bear the consequences of the system that our species has come up with. What is good is thus decided by consensus. Still, the values we have converged to are shaped by the way we have evolved to behave (e.g. empathy and pain avoidance).
Deutsch briefly summarized his view on AI risks in this podcast episode: https://youtu.be/J21QuHrIqXg?t=3450 (Unfortunately there is no transcript.)
What are your thoughts on his views apart from what you’ve touched upon above?