Watts, son
Some interesting numbers to contextualize IBM’s Watson:
90 Power 750 Express servers, each with 4 CPUs, each of those having 8 cores
Total of 15TB RAM (yep, all of Watson’s data was stored in RAM for rapid search. The human brain’s memory capacity is estimated at between 3 and 6 TB, and not all of that functions like RAM, and it’s implemented in meat.)
Each of the Power 750 Express servers seems to consume a maximum of 1,949 watts, making a total of 175kw for the whole computer
There also appears to be a sophisticated system connected to the Jeopardy buzzer but I can’t find power specs for that part.
IBM estimates that Watson can compute at about 80 teraflops (10^12). This paper mentions in passing that the human brain operates in the petaflop range (10^15), but at the same time, a brain is not a digital system and so the flop comparison is less meaningful.
To put this in perspective, a conservative upper bound for a human being standing still is at most about 150w — less than 1⁄10 of 1% of Watson — and the person just holds the buzzer and operates it with a muscular control system.
Each of the servers generates a maximum of 6,649 BTU/hour. Watson overall would generate about 600,000 BTU/hour and require massive amounts of air conditioning. I don’t know a good estimate on heat removal, but it would up Watson’s energy cost significantly.
I don’t mean to criticize Watson unduly; it certainly is an impressive engineering achievement and has generated a lot of good publicity and public interest in computing. The engineering feat is impressive if for no other reason than that it is the first accomplishment of this scale, and pioneering is always hard… future Watsons will be cheaper, faster, and more effective because of IBM’s great work on this.
But at the same time, the amazing power and storage costs for Watson really kind of water it down for me. I’m not surprised that if you throw power and hardware and memory at a problem, you can use rather straightforward machine learning methods to solve it. I feel similarly about Deep Blue and chess.
A Turing test that would be more impressive to me would be building something like Watson or Deep Blue that is not allowed to consume more power than an average human, and has comparable memory and speed. The reason this would be impressive is that in order to build it, you’d have to have some way of representing data and reasoning in the system that is efficient to a similar degree that human minds are. One thing you could not do is simply concatenate an unreasonable number of large feature vectors together and overfit a machine learning model. Since this is an important open problem with lots of implications, we should use funding and publicity to drive research organizations like IBM towards that goal. Maybe building Watson is a first step and now the task is to miniaturize Watson, and in doing so, we’ll be forced to learn about efficient brain architectures along the way.
Note: I gathered the numbers above by looking here and then scouring around for various listings of specific hardware specs. I’m willing to believe some of my numbers might be off, but probably not significantly.
- 21 Sep 2011 18:17 UTC; 3 points) 's comment on A philosophy professor elicits college students’ reactions to Less Wrong by (
I’d rather research focus on solving as hard a problem as we can, rather than on solving as hard a problem as we can with 150 watts of power. The latter may be more impressive in some sense, but the former solves harder problems when more than 150 watts of power are available, and more than 150 watts of power are available.
Constrained problems are often harder than unconstrained ones. They are also often more useful and lead to insights that end up being useful across a wider range of additional problems. Solving problems of human natural language with unlimited resources and paying little attention to how much power is consumed is much easier than solving the same problem with severe resource constraints. And having a solution that uses fewer resources would be way more useful to us.
When I’m writing code in an area which I don’t really understand, I often write exploratory code that uses way more resources than a polished solution would, but it’s not code I’m writing instead of polished code—I’m writing that code because it will help me learn enough to write polished, elegant code later.
Solving a problem with brute force is usually a faster path toward a clean solution than refusing to implement any solution until you’ve thought of the best one.
But you don’t start out trying to solve the problem in a hilariously inappropriate way. For example, if your boss said, “hey, sort these 10 billion numbers” you wouldn’t do simulated annealing with a cost function that penalizes unsorted entries, and then just make random swaps in the data and tell your boss to come back in 10 years when it will only probably be finished with an only probably correct answer. That’s a categorical waste of resources, not a strategic upping of resources to get a first, but still reasonable, attempt that you can then whittle into something better.
As a machine learning researcher, my opinion is that Watson is more like simulated annealing. It’s like someone said, “Hey how can we make this thing play jeopardy without even thinking at all about how it will do the data processing… how large do we have to make it if its processing is as stupid and easy to implement as possible?”
See my other comment for more on this.
Conventional coding practice is prototype first, optimization second.
Thus, optimizing for power consumption is only really useful if you consider Watson to be a useful prototype that we’d benefit from marketing as a product. Otherwise, we’re better off waiting until we have something that actually usefully benefits from optimizations. And we’ll get there faster by being wasteful.
See two other comments. I am well aware of this idea in rapid prototyping. I’m not quibbling over being a bit wasteful because of legitimate engineering concerns. I do rapid prototyping in Python for computer vision algorithms all day, and if I handed my boss the equivalent of Watson but for object tracking, I’d get laughed out of the room. I’m not even talking about stringent power optimization; I’m only asking for the intentionality of the problem solvers to even be aimed at the same problem, in the same ball park. There’s a big difference between saying, “how large do we have to make this in order for a stupid, bad qualitative solution to be computationally tractable” vs. “what is a qualitatively insightful way to solve this problem, and then we can up the resources if the full-blown solution would just be a mere matter of optimization.” Watson is not something for which power optimization could even in principle reduce it to efficiency on the scale of a primate brain. You have to actually represent the whole problem of natural language processing differently than they have.
I already agree that Watson is a brilliant device for generating public interest, which was its original intent. It may even be good in its new role helping to assist in medical queries. But none of these has anything to do with why I brought up the hardware constraints in the first place.
I also strongly dispute the claim that power optmization is only useful if we see Watson as a product. First of all, some people do and second, solving natural language and cognition problems has a lot of potential benefits for society regardless of whether or not there are specific products stemming from it in the short term. You might benefit from this book when it comes out in Feb. 2012.
I think that it is very odd, verging on category error, to talk of power consumption and flops in the same breath. Flops needed is purely a function of the algorithm, while power consumption is a joint measure of the software and hardware. Power decomposes into energy per operation and operations per second, a fairly neat division into hardware and software. We expect power consumption to fall in a smooth manner, while we expect software efficiency to decline discontinuously, if at all.
So many haters. The OP’s point is simple: curb your enthusiasm because Watson is just a very large, very costly version of something we know how to do very well. Incidentally, Watson doesn’t seem to scale particularly well, granting marginal improvements using vastly more resources. What we learn from Watson is that this technique is likely not the right direction.
Exactly. A.I. researchers really need to be familiar with these two papers, which were also linked in a comment above. Despite the Parberry analysis also linked above, I am still a believer in Strong A.I. But I definitely found it instructive to grapple with computational complexity arguments. It made me reassess certain opinions about resources that I had not thought about before.
I also like the eminent MIT mathematician B.K.P. Horn’s comments, relayed here as part of an abstract to a recent talk he gave:
I think the problem with the Turing test is that it isn’t a test of intelligence so much as a definition for it. If you deny the Turing test, you deny that intelligence (as defined) is a meaningful concept. I think there might be good reasons to make such a move.
I agree. You cannot decouple the ethical consequences of labeling an agent as “intelligent” from the choice of a definition. If intelligence is pure capacity to perform a certain action, then we might force ourselves to give look-up tables equal rights or make it illegal to unplug a future, slightly souped-up Watson. The Shieber paper alleviates some of this worry by showing how if you take a Turing test as a definition for intelligence, it functions as an interactive proof that the agent really has a general capacity for doing whatever the test requires, i.e. complexity eliminates the case of it doing brute force look-up or something which you may not want to define as intelligent.
But even so, humans judge each other to be conscious and intelligent based on pretty slim evidence (a two-minute conversation, say, or even just a sequence of posts and replies on a blog site like this one). How does anyone here know I’m not just a natural language bot like CleverBot? The short answer is that computational complexity precludes this, and so each new fluid reply I make acts like additional Bayesian evidence that p4wnc6 is not an exponential look-up bot in a table of conversation replies.
Humanity has never had to deal with possibilities like CleverBot until now. That means we’re probably going to re-define what constitutes effective person-to-person interactive proof of intelligence, at least if automatons become good enough at natural language tasks. Imagine a bank’s customer service webpage being run by a variant of CleverBot. Even if it “gets the behavior right” and as far as I can tell it responds and acts like a human using a chat client, I still might want to save judgement and wait until it passes some vastly more complicated Turing test before I make emotional or intuitive decisions about how to treat it or interact with it under the assumption that it is intelligent.
Similarly, if intelligence is just outward behavior, then how much hardware do I have to add to Watson before it becomes immoral to unplug the machine? If that question isn’t a function of hardware, then intelligence isn’t a function solely of behavior but also human concerns for how the behavior algorithmically happens.
This assumes that we’d give rights to computers; they might not be considered morally important even if intelligent, especially if all that means is it being conversant.
I completely agree. I am just saying you can’t divorce the decision to label something as ‘intelligent’ from the ethical discussion of what such a label might imply. If labeling something as ‘intelligent’ has no consequence whatsoever, and is just a purely quantitative measure of the capacity to perform an action, then it’s just a pedantic matter of definitions. The only part of any interest is whether something hangs in the balance over the decision to label or not label something intelligent.
Sure you can. These are two independent steps, they needn’t be taken together.
Then the first label of intelligence is merely a definition about capacity to perform a measurable action, and any argument over it is just pedantic hair splitting. Note that I don’t take this view, because in practice, lots of things hinge on whether we consider someone or something else to be intelligent. Consider the Lobster by David Foster Wallace is a great essay about this, though its focus is on cross-species ethical obligations and why we do/do not feel they are important. But clearly the more “sentient” something is, the more ethically we treat it and the more effort and social moral stigma that goes into it. And this is largely judged by tiny little Turing tests going on all day every day.
People keep downvoting my comments without giving any sensible replies or criticisms. Poor form.
It might be because of your focus on definitions.
What exactly do you mean by “the ethical consequences of labeling”? It could mean something...unpopular, but it is at least ambiguous.
With current technology, anything with comparable speed to a human would use far more power.
The power generated is as much about the efficiency of the hardware as the size. If you’re looking at how good a program is, all that should matter is the amount of memory and the number of operations done. Watson had three to five times as much memory as a human, but it ran at a one thousandth of the speed. As awesome as the hardware is, it’s still less than what people get, so the program would be extra impressive.
Humans don’t get 15TB of RAM, though. It’s unknown how much of the 3-6TB we do get operates essentially like RAM and how much like disk. And comparing flops between humans and a multiprocessor machine isn’t very useful. Human flops don’t correlate to state changes in a program the same way they do in a multiprocessor. The flops details are really comparing apples to oranges. Where flops matter for machines, in my opinion, is in simulating a brain. We know from universality that any Turing machine can simulate any other. But to simulate a brain in a multiprocessor, we’d have to deal with the realities of all of the computational overheads involved in simulating one type of computation with another. The Parberry paper I linked to in the OP treats this as an argument against strong A.I. by arguing that it won’t be feasible with engineering to actually build a machine fast enough to overcome the overhead of simulating a petaflops brain, even if the “petaflops” that the brain is performing are way more hollow that the flops on the multiprocessor machine.
Yeah, it’s not that useful, but power consumption is much less useful.
What I mean to say is that gigaflops are plenty enough such that, with a smart software design, you can achieve software efficiency close to a petaflops (but not digital) brain. I think there is a lot of evidence for this. But it’s not at all the same with power and RAM. If Watson had the same flops as a human mind, but still had the same power and memory listed above, I still wouldn’t feel it was a “success” if Watson could do natural language tasks well enough to statistically fool humans (i.e. pass some kind of Turing test).
When you’re talking about passing a Turing test, power is absolutely key. If Watson was in the room next to me supplying replies to verbal queries, I would start to get hot from all of Watson’s waste heat. It’s great that it can answer natural language questions, but I’m saying a better goal for many reasons is to answer natural language questions in a manner that doesn’t require massive heat removal. Invest time into figuring out how to answer natural language questions on a desktop PC with 12GB RAM. If scaling the flops down to that machine causes performance limitations, then work around them. But if the answer is: “well, having such few flops causes performance limitations, so lets just make the whole thing bigger” then it’s inherently uninteresting. Evolution could only pull that lever a certain amount, which is why brain software is so impressive.
“Solve the hard problem before the easy precursor problem.”
If that wasn’t what you meant, I may have misunderstood you.
But we aren’t even up to using the kind of processing power that evolution used. Human-level reasoning in a machine will be impressive without regard to the physical characteristics of the machine it runs on. Once the problem is well-understood, we’ll get smaller and cheaper versions.
There’s a categorical difference between “try to find a reasonable solution” and “throw money at this until it’s no longer a problem” and you’re acting like there isn’t. I already made exactly the same comments you have in the OP, where I said:
But there’s a categorical difference in the two approaches. In my own field of computer vision, it’s like this: if you want to understand how face recognition works, you will study the neuroscience of primate brains and come up with compact and efficient representations of the problem that can run in a manner similar to the way primates do it. If you just want to recognize faces right now, you just concatenate every feature vector imaginable at every scale level that could conceivably be relevant and you train 10,000 SVMs over a month and then use cross-validation and mutual information to reduce that down to a “lean” set of 2,000 SVMs and there you go, you’ve overfitted a solution that still leaves face recognition as a total black box, and you use orders of magnitude more resources and time to get that solution.
It’s interesting that current researchers who spent years working on the primate brain / Barlow infomax principle idea and studied monkey face recognition at Cal Tech, and couldn’t do good face recognition for years, are now blowing face.com and other proprietary face recognition software out of the water.
There’s a categorical difference between even trying to solve the hard problem and resorting to using more resources when you have to, vs. just overblowing the whole thing and not even making an attempt at solving the hard problem. From what I know about natural language processing, machine learning, and Watson, Watson is the latter approach and its power and memory consumption reveal it to be quite unimpressive… though hopefully trying to miniaturize it will spawn interesting engineering research.
Yeah, I read them at different times, and missed that.
So, if we made a program that beat the Turing test, but the hardware consumed a lot of power, it would be a failure, but if we ran the program on different hardware with the exact same specs, except it was more energy efficient, it would be a success?
You’re ignoring fundamental limits of computing efficiency here. You can’t have the same specs if you have many orders of magnitude more energy efficiency. Something’s got to give. At the transistor level you can’t preserve the same amount of computation for vastly less power. This is why a petaflops human brain is not the same as a petaflops super computer. Computation is represented differently because the power constraints of a human brain force it to be. You cannot do the same amount of processing with a brain that you can do with a modern petaflops cluster. It’s the software that matters because the power constraint forces a better software design.
You could make the same arguments I have made with physical space. Watson would be more impressive if it fit inside of the real volume of an average human head. It’s just simple physics. The hardware and volume occupied are inherent to the computations it’s doing. If no attention is paid to these constraints at all, then it’s not surprising or impressive that a solution can be brute forced with stupendous resources. We’re not talking about the difference between a clean diesel sedan and a Prius. We’re talking about re-purposing a military humvee and being impressed you can use it to take the kids to soccer practice.
You’re assuming we’re at the fundamental limits of computing efficiency. If we are, power is essentially instructions per second. If not, it’s less useful than instructions per second. You might as well just say instructions per second and ignore power.
For what it’s worth, I’m not unimpressed because of the computing power used. I’m unimpressed because of the inflexibility of the program. If we built a true AI using the combined might of every computer on the planet, that would be impressive. Limiting computing power makes creating intelligence harder, but no matter how much you have, it’s far from easy. Chess can be brute forced, but intelligence can’t.
(I know Watson isn’t a chess program, and is far more impressive than one, but it’s still nothing compared to true AI.)
I disagree that intelligence can’t be brute forced, at least if you don’t care about computational resources. Presumably, what we mean by ‘intelligence’ is the passing of some Turing test (otherwise, if you just define ‘intelligence’ to be passing a Turing test with some kind of “elegance” in the design of the program, then your claim is true but only because you defined it to be that way).
If computational resources truly weren’t bounded, then we could build a massively inefficient lookup table whose search grows exponentially (or worse) in the length of the input. See this paper and this one for arguments about how to bound the complexity of such a lookup table argument. This paper is also very useful and the writing style is great.
What you cannot do, however, is claim that intelligence cannot be brute forced (again, under the assumption we ignore resources), without some appeal to computational complexity theory.
In particular, the Aaronson paper points out that according to Searle’s (flawed) Chinese room argument and Ned Block’s criticisms of the Turing test, complexity theory puts us in the situation where it is exactly the efficiency of an algorithm that gives it the property we ascribe to intelligence. We only know something’s intelligent because any reasonable Turing test that “respects” human intelligence will also function like a zero-knowledge proof that any agent who can pass the test does not have an algorithm that’s exponential in the size of the input.
Watson achieves the necessary speed (for the greatly restricted test of playing Jeopardy) but as you mentioned, Watson is easy to unmask simply by asking for the second or third best answers. However, in terms of complexity theory, Watson’s program fails the Turing test detrimentally… the program’s resource efficiency is dismal compared to humans. It’s doing something ‘stupid’ like a slow lookup with some correlations and statistical search. With similar resources to a human, this approach would be doomed to failure, so IBM just scaled up the resources until this bad approach no longer failed on the desired test cases.
Thus, if you would want to use ‘intelligence’ to label a planet sized computer which solves human Turing tests by brute forcing them with tremendously outrageous resource consumption, this would be a fundamental departure from what the literature I linked above considers ‘intelligence.’ If the planet-sized computer did fancy, efficient algorithms, then its massive resources would imply it can blow away human performance. The Turing test should be testing for the “general capacity” to do something, whether by lookup table or some equally stupid way, or by efficient intelligence.
Complexity theory really plays a large large role in all this. I would also add that I see no reason not to call a massive look-up table intelligent… assuming it is implemented in some kind of hardware and architecture that’s much better than anything human’s know about. If it turned out that human minds, for example, were some kind of quantum gravity look-up table (I absolutely do not believe this at all, but just for the sake of argument) I would not instantly believe that humans are not intelligent.
A planet-sized computer isn’t big enough to brute-force a Turing test. At least, it isn’t big enough to build a look-up table. Actually brute forcing a Turing test would require figuring out how the human would react to each possible output, in which case you’ve already solved AI.
If you had nigh infinite computing power you could create AIXI trivially. If you had quite a bit less, but still nigh infinite, computing power and programming time you could create a lookup table. If you had a planet-sized computer, you could probably create a virtual world in which intelligence evolves, though it would be far from trivial. Anything less than that, and it’s a nigh insurmountable task.
Increasing the computing power would make it easier, in that it wouldn’t make it harder, but within any reasonable bounds it’s not going to help much.
Why would the hardware matter?
Because regardless of software inefficiency, unless the hardware is able to produce solutions in real time, it can’t pass a Turing test. A massive look-up table would be just fine as an intelligence if it had the hardware throughput to do its exponential searches fast enough to answer in real time, the way a human does.
No, not even close. A planet sized computer is not even close to being up to that task, both the Parberry paper and Nick Bostrom’s simulation argument papers refute that quantitatively.
It depends on the Turing test. The Shieber paper shows that you are correct if the Turing test is “carry out a 5 minute long English conversation on unrestricted topics” but Watson says you’re wrong if the Turing test is “win at Jeopardy.” Both the test and the hardware matter.
Not true. It just requires coming up with an algorithmic shortcut that mimics a plausible human output. Just think of the Eliza chatbox that fooled people into believing it was a psychologist just by parroting their statements back as questions. Even after being told about it, many people refused to believe they had not just been talking to a psychologist … and yet, almost no one today would ascribe intelligence to Eliza.
A planet-sized computer doing dumb search in a lookup table built around some cute algorithmic tricks could mimic human output extremely well, especially in restricted domains. How would we unmask it as unintelligent? Suppose a company used it as a customer service chatbot for conversations never lasting more than 1 minute, and that its heuristics and massive search were adequate for coming up with appropriate replies to 99.999% of the conversations it would face of length up to 1 minute. As soon as you know the way its software works and that its ability is purely based on scaling up dumb software with lots of hardware, you’d declare it unintelligent. Prior to that, though, it would be indistinguishable from a human in 1 minute or less conversations.
Isn’t that like saying that air would be just fine as an intelligence, if you told a person the questions you were going to use during the Turing test and how much time you would take asking each one (and your hypothetical responses to their hypothetical responses, etc.) if only sound waves could be recorded and replayed at precisely the right time? Which they can be, though that is beside the point.
A look up table is functionally absolute evidence of intelligence without being at all intelligent just as air is for a Turing test between two humans.
I disagree. I think the section called “computation and waterfalls” in this paper makes a good case against this analogy.
I think my point was not clearly communicated, because that section is not relevant to this.
That section is about how just about any instance of a medium could be interpreted as just about any message by some possible minds. It picks an instance of a medium and fits a message and listener to the instance of a medium.
I am suggesting something much more normal: first you pick a particular listener (the normal practice in communication) and particular message, and I can manipulate the listener’s customary medium to give the particular listener the particular message. In this case, many different messages would be acceptable, so much the easier.
When administering a Turing test, why do you say it is the human on the other terminal that is intelligent, rather than their keyboard, or the administrator’s eyeballs? For the same reasons, a look up table is not intelligent.
First of all, the “systems reply” to Searle’s Chinese room argument is exactly the argument that the whole room, Searle plus the book plus the room itself, is intelligent and does understand Chinese, regardless of whether or not Searle does. Since such a situation has never occured with human-to-human interaction, it’s never been relevant for us to reconsider whether what we think is a human interacting with us really is intelligent. It’s easy to envision a future like Blade Runner where bots are successful enough that more sophisticated tests are needed to determine if something is intelligent. And this absolutely involves speed of the hardware.
Also, how do you know that a person isn’t a lookup table?
Would you say that neurosurgery is “teaching”, if one manipulates the brain’s bits such that the patient knows a new fact?
The probability is low that a person is a lookup table based on the rules of physics for which the probability is high. If someone is a lookup table controlled remotely from another universe with incomprehensibly more matter than ours, or similar...so what? That just means an intelligence arranged the lookup table and it did not arise by random, high-entropy coincidence, one can say this with probability as close to absolute as it gets. Whatever arranged the lookup table may have arisen by a random, high-entropy process, like evolution, but so what?
Something arbitrarily slow may still be intelligent, by any normal meaning. More things are intelligent than pass the Turing test (unless it is merely offered as a definition) just as more things fly than are birds.
If the laws of physics are very different than I think they are, one could fit a lookup table inside a human-sized body. That would not make it intelligent any more than expanding the size of a human brain would make it cease to be intelligent. That wouldn’t prevent a robot from operating analogously to a human other than being on a different substrate, either.
What do you mean when you say “intelligence”? If you mean something performing the same functions as what we agree is intelligence given a contrived enough situation, I agree a lookup table could perform that function.
The problem with what I think is your definition isn’t the physical impossibility of creating the lookup table, but that once the informational output to an input is informationally as complex as it will ever be, anything transformation happening afterwards isn’t reasonably intelligence. The whole system of the lookup table’s creator and the lookup table may perhaps be described as an intelligent system, but not the fingers of the creator and the lookup table alone.
I’d hate to argue over definitions, but I’m interested in “Intelligence can be brute forced” and I wonder how common you think your usage is?
Yes, I am only considering the Turing test as a potential definition for intelligence, and I think this is obvious from the OP and all of my comments. See Chapter 7 of David Deutsch’s new book, The Beginnings of Infinity. Something arbitrarily slow can’t pass a Turing test that depends on real time interaction, so complexity theory allows us to treat a Turing test as a zero-knowledge proof that the agent who passes it possess something computationally more tractable than a lookup table. I also dismiss the lookup tables, but the reason why is that iterating conversation in a Turing test is Bayesian evidence that the agent interacting with me can’t be using an exponentially slow lookup table.
I agree with you that a major component of intelligence is how the knowledge is embedded in the program. If the knowledge is embedded solely by some external creator, then we don’t want to label that as intelligent. But how do we detect whether creator-embedded knowledge is a likely explanation? That has to do with the hardware it is implemented on. Since Watson is implemented on such massive resources, the explanation that it produces answers from searching a store of data has more likelihood. That is more plausible because of Watson’s hardware. If Watson achieved the same results with much less capable hardware, it would make the hypothesis that Watson’s responses are “merely pre-sorted embedded knowledge” less likely (assuming I knew no details of the software that Watson used, which is one of the conditions of a Turing test).
If you tell me something can converse with me, but that it takes 340 years to formulate a response to any sentence I utter, then I strongly suspect the implementation is arranged such that it is not intelligent. Similarly, if you tell me something can converse with me, and it only takes 1 second to respond reasonably, but it requires the resources of 10,000 humans and can’t produce responses of any demonstrably better quality than humans, then I also suspect it is just a souped-up version of a stupid algorithm, and thus not intelligent.
The behavior alone is not enough. I need details of how the behavior happens, and if I’m lacking detailed explanations of the software program, then details about the hardware resources it requires also tell me something.
But it would mean that having a conversation with a person was not conclusive evidence that he or she wasn’t a lookup table implemented in a human substrate.
Yes, absolutely. “Regular” teaching is just exactly that, but achieved more slowly by communication over a noisy channel.
To strengthen DanielLC’s point, say we have a software program capable of beating the Turing test. In room A, it runs on a standard home desktop, and it is pitted against a whole-brain-emulation using several supercomputer clusters consuming on the order of ten megawatts.
In room B, the software program is run on a custom-built piece of hardware—a standard home desktop’s components interlinked with a large collection of heating elements, heatsinks attached to these heating elements, and cooling solutions for the enormous amount of heat generated, consuming on the order of ten megawatts. It is pitted against the person whom the whole-brain-emulation copy was made from—a whole-brain-emulation running on one whole brain.
It makes no sense that in room A the software wins and in room B the brain wins.
I agree that if it’s just hooked up to heat producing elements that play absolutely no role in the computation, then that waste heat or extra power consumption is irrelevant. But that’s not the case with any computer I’ve ever heard of and certainly not with Watson. The waste heat is directly related to its compute capability.
See the papers linked in my other comment for much more rigorous dismantling of the idea that resource efficiency doesn’t matter for a Turing test.
Also, the big fundamental flaw here is when you say:
You’re acting like this is a function of the software with no concern for the hardware. A massive, inefficient, exponentially slow (software-wise) look-up table can beat the Turing test if you either (a) give it magically fast hardware or (b) give it extra time to finish. But this clearly doesn’t capture the “spirit” of what you want. You want software that is somehow innately efficient in the manner it solves the problem. This is why most people would say a brute force look-up table is not intelligent, but a human is. Presumably the brain does something more resource efficient to generate responses the way that it does. But all 5 minute conversations, for example, can be bounded in terms of total number of bits transmitted, and you make a giant, unwieldy (but still finite) look-up table for every possible 5 minute conversation that could ever happen, and just make something win the “have a 5 minute conversation” Turing test by doing a horrible search in that table.
When you say “make some software that beats a Turing test” this is == “make some software that does a task in a resource efficient manner”, as the Shieber and Aaronson papers point out. This is why Searle’s Chinese room argument utterly falls apart: computational complexity says you could never have large enough resources to actually build Searle’s Chinese room, nor the book for looking up Chinese characters. It’s the same with the “software” you mention. You might as well call it “magic software.”
I’ve found a link to some numbers on IBM’s site. They list a bunch of IBM products, including the Power 750 Express servers and IBM’s newest servers, the Power 795.
http://public.dhe.ibm.com/common/ssi/ecm/en/poy03032usen/POY03032USEN.PDF http://www-03.ibm.com/systems/power/hardware/compare.html?750_exp,795
If I’m reading these specs correctly, Watson’s 2880 Power 7 cores and 16 TB of Ram that it got from 90 servers can be duplicated by 12 Power 795 servers. Is that actually accurate? Each 795 can hold 256 cores, and 256*12> 2880. And the ram is easy, because each 795 can hold 8 TB of Ram. I believe AIX® rPerf Ranges also indicate substantial improvement: Power 750 has 52.29 – 334.97, and Power 795 has 273.51 – 2978.16 But there might be other computer architecture specs I’m not reading correctly and some of it is vague, like the reference to “Use I/O drawers.”
Unfortunately, I don’t see power consumption figures for the 795, or I could check the power consumption of a 795 against the 750 and see how wattage efficiency changed. The 795 appears better than the 750, but if it’s just bigger, it’s a moot point. It does appear somewhat bigger, but I’m having a hard time breaking down how much of those improvements are from size and how much of that improvement is from other factors.
I stumbled across this link which is an interview of four prominent A.I. researchers, discussing A.I. through the context of the history of computer chess.
I think many of the comments made about “knowledge vs. search” and creativity vs. combinatorics are salient and related to many of the comments on this thread. I think there is some bolstering of the point of view that the “intelligence” of a program can’t be decoupled from its computational resources.