First, examining the dispute over whether scalable systems can actually implement a distributed AI...
This is one reason why even Google’s datastore, AFAIK, does not implement exactly this kind of architecture—though it is still heavily sharded. This type of a datastructure does not easily lend itself to purely general computation, either, since it relies on precomputed indexes, and generally exploits some very specific property of the data that is known in advance.
That’s untrue; Google App Engine’s datastore is not built on exactly this architecture, but is built on one with these scalability properties, and they do not inhibit its operation. It is built on BigTable, which builds on multiple instances of Google File System, each of which has multiple chunk servers. They describe this as intended to scale to hundreds of thousands of machines and petabytes of data. They do not define a design scaling to an arbitrary number of levels, but there is no reason an architecturally similar system like it couldn’t simply add another level and add on another potential roundtrip. I also omit discussion of fault-tolerance, but this doesn’t present any additional fundamental issues for the described functionality.
In actual application, its architecture is used in conjunction with a large number of interchangeable non-data-holding compute nodes which communicate only with the datastore and end users rather than each other, running identical instances of software running on App Engine. This layout runs all websites and services backed by Google App Engine as distributed, scalable software, assuming they don’t do anything to break scalability. There is no particular reliance of “special properties” of the data being stored, merely limited types of searching of the data which is possible. Even this is less limited than you might imagine; full text search of large texts has been implemented fairly recently. A wide range of websites, services, and applications are built on top of it.
The implication of this is that there could well be limitations on what you can build scalably, but they are not all that restrictive. They definitely don’t include anything for which you can split data into independently processed chunks. Looking at GAE some more because it’s a good example of a generalised scalable distributed platform, the software run on the nodes is written in standard Turing-complete languages (Python, Java, and Go) and your datastore access includes read and write by key and by equality queries on specific fields, as well as cursors. A scalable task queue and cron system mean you aren’t dependent on outside requests to drive anything. It’s fairly simple to build any such chunk processing on top of it.
So as long as an AI can implement its work in such chunks, it certainly can scale to huge sizes and be a scalable system.
And, as you also mentioned, even with these drastic tradeoffs you still get O(n log(n)).
And as I demonstrated, O(n log n) is big enough for a Singularity.
And now on whether scalable systems can actually grow big in general...
You mention Amazon (in addition to Google) as one example of a massively distributed system, but note that both Google and Amazon are already forced to build redundant data centers in separate areas of the Earth, in order to reduce network latency.
Speed of light as an issue is not a problem for building huge systems in general, so long as the number of roundtrips rises as O(n log n) or less, because for any system capable of at least tolerating roundtrips to the other side of the planet (few hundred milliseconds), it doesn’t become more of an issue as a system gets bigger, until you start running out of space on the planet surface to run fibre between locations or build servers.
The GAE datastore is already tolerating latencies sufficient to cover distances between cities to permit data duplication over wide areas, for fault tolerance. If it was to expand into all the space between those cities, it would not have the time for each roundtrip increase until after it had filled all the space between them with more servers.
Google and Amazon are not at all forced to build data centres in different parts of the Earth to reduce latency; this is a misunderstanding. There is no technical performance degradation caused by the size of their systems forcing them to need the latency improvements to end users or the region-scale fault tolerance that spread out datacentres permit. They can just afford it more easily. You could argue there are social/political/legal reasons they need it more, higher expectations of their systems and similar, but these aren’t relevant here. This spreading out is actually largely detrimental to their systems since spreading out this way increases latency between them, but they can tolerate this.
Heat dissipation, power generation, and network cabling needs all also scale as O(n log n), since computation and communication do and those are the processes which create those needs. Looking at my previous example, the amount of heat output, power needed, and network cabling required per amount of data processed will increase by maybe an order of magnitude in scaling such a system upwards by tens of orders of magnitude, 5x for 40 orders of magnitude in the example I gave. This assumes your base amount of latency is still enough to cover the distance between the most distant nodes (for an Earth bound system, one side of the planet to the other), which is entirely reasonable latency-wise for most systems; a total of 1.5 seconds for a planet-sized system.
This means that no, these do not become an increasing problem as you make a scalable system expand, any more so than provision of the nodes themselves. You are right in that that heat dissipation, power generation, and network cabling mean that you might start to hit problems before literally “running out of planet”, using up all the matter of the planet; that example was intended to demonstrate the scalability of the architecture. You also might run out of specific elements or surface area.
These practical hardware issues don’t really create a problem for a Singularity, though. Clusters exist now with 560k processors, so systems at least this big can be feasibly constructed at reasonable cost. So long as the software can scale without substantial overhead, this is enough unless you think an AI would need even more processors, and that the software could is the point that my planet-scale example was trying to show. You’re already “post Singularity” by the time you seriously become unable to dissipate heat or run cables between any more nodes.
This means that, even in an absolutely ideal situation where we can ignore power, heat dissipation, and network congestion, you will still run into the speed of light as a limiting factor. In fact, high-frequency trading systems are already running up against this limit even today.
HFT systems desire extremely low latency; this is the sole cause of their wish to be close to the exchange and to have various internal scalability limitations in order to improve speed of processing. These issues don’t generalise to typical systems, and don’t get worse at above O(n log n) for typical bigger systems.
It is conceivable that speed of light limitations might force a massive, distributed AI to have high, maybe over a second latency in actions relying on knowledge from all over the planet, if prefetching, caching, and similar measures all fail. But this doesn’t seem like nearly enough to render one at all ineffective.
There really aren’t any rules of distributed systems which says that it can’t work or even is likely not to.
First, examining the dispute over whether scalable systems can actually implement a distributed AI...
That’s untrue; Google App Engine’s datastore is not built on exactly this architecture, but is built on one with these scalability properties, and they do not inhibit its operation. It is built on BigTable, which builds on multiple instances of Google File System, each of which has multiple chunk servers. They describe this as intended to scale to hundreds of thousands of machines and petabytes of data. They do not define a design scaling to an arbitrary number of levels, but there is no reason an architecturally similar system like it couldn’t simply add another level and add on another potential roundtrip. I also omit discussion of fault-tolerance, but this doesn’t present any additional fundamental issues for the described functionality.
In actual application, its architecture is used in conjunction with a large number of interchangeable non-data-holding compute nodes which communicate only with the datastore and end users rather than each other, running identical instances of software running on App Engine. This layout runs all websites and services backed by Google App Engine as distributed, scalable software, assuming they don’t do anything to break scalability. There is no particular reliance of “special properties” of the data being stored, merely limited types of searching of the data which is possible. Even this is less limited than you might imagine; full text search of large texts has been implemented fairly recently. A wide range of websites, services, and applications are built on top of it.
The implication of this is that there could well be limitations on what you can build scalably, but they are not all that restrictive. They definitely don’t include anything for which you can split data into independently processed chunks. Looking at GAE some more because it’s a good example of a generalised scalable distributed platform, the software run on the nodes is written in standard Turing-complete languages (Python, Java, and Go) and your datastore access includes read and write by key and by equality queries on specific fields, as well as cursors. A scalable task queue and cron system mean you aren’t dependent on outside requests to drive anything. It’s fairly simple to build any such chunk processing on top of it.
So as long as an AI can implement its work in such chunks, it certainly can scale to huge sizes and be a scalable system.
And as I demonstrated, O(n log n) is big enough for a Singularity.
And now on whether scalable systems can actually grow big in general...
Speed of light as an issue is not a problem for building huge systems in general, so long as the number of roundtrips rises as O(n log n) or less, because for any system capable of at least tolerating roundtrips to the other side of the planet (few hundred milliseconds), it doesn’t become more of an issue as a system gets bigger, until you start running out of space on the planet surface to run fibre between locations or build servers.
The GAE datastore is already tolerating latencies sufficient to cover distances between cities to permit data duplication over wide areas, for fault tolerance. If it was to expand into all the space between those cities, it would not have the time for each roundtrip increase until after it had filled all the space between them with more servers.
Google and Amazon are not at all forced to build data centres in different parts of the Earth to reduce latency; this is a misunderstanding. There is no technical performance degradation caused by the size of their systems forcing them to need the latency improvements to end users or the region-scale fault tolerance that spread out datacentres permit. They can just afford it more easily. You could argue there are social/political/legal reasons they need it more, higher expectations of their systems and similar, but these aren’t relevant here. This spreading out is actually largely detrimental to their systems since spreading out this way increases latency between them, but they can tolerate this.
Heat dissipation, power generation, and network cabling needs all also scale as O(n log n), since computation and communication do and those are the processes which create those needs. Looking at my previous example, the amount of heat output, power needed, and network cabling required per amount of data processed will increase by maybe an order of magnitude in scaling such a system upwards by tens of orders of magnitude, 5x for 40 orders of magnitude in the example I gave. This assumes your base amount of latency is still enough to cover the distance between the most distant nodes (for an Earth bound system, one side of the planet to the other), which is entirely reasonable latency-wise for most systems; a total of 1.5 seconds for a planet-sized system.
This means that no, these do not become an increasing problem as you make a scalable system expand, any more so than provision of the nodes themselves. You are right in that that heat dissipation, power generation, and network cabling mean that you might start to hit problems before literally “running out of planet”, using up all the matter of the planet; that example was intended to demonstrate the scalability of the architecture. You also might run out of specific elements or surface area.
These practical hardware issues don’t really create a problem for a Singularity, though. Clusters exist now with 560k processors, so systems at least this big can be feasibly constructed at reasonable cost. So long as the software can scale without substantial overhead, this is enough unless you think an AI would need even more processors, and that the software could is the point that my planet-scale example was trying to show. You’re already “post Singularity” by the time you seriously become unable to dissipate heat or run cables between any more nodes.
HFT systems desire extremely low latency; this is the sole cause of their wish to be close to the exchange and to have various internal scalability limitations in order to improve speed of processing. These issues don’t generalise to typical systems, and don’t get worse at above O(n log n) for typical bigger systems.
It is conceivable that speed of light limitations might force a massive, distributed AI to have high, maybe over a second latency in actions relying on knowledge from all over the planet, if prefetching, caching, and similar measures all fail. But this doesn’t seem like nearly enough to render one at all ineffective.
There really aren’t any rules of distributed systems which says that it can’t work or even is likely not to.