Downloading yourself into internet is not one-second process
which are likely to be fast at the server where this AI would be, if it were developed by a large lab.
Yes, no one is developing cutting-edge AIs like GPT-5 off your local dinky Ethernet, and your crummy home cable modem choked by your ISP is highly misleading if that’s what you think of as ‘Internet’. The real Internet is way faster, particularly in the cloud. Stuff in the datacenter can do things like access another server’s RAM using RDMA in a fraction of a millisecond, vastly faster than your PC can even talk to its hard drive. This is because datacenter networking is serious business: it’s always high-end Ethernet or better yet, Infiniband. And because interconnect is one of the most binding constraints on scaling GPU clusters, any serious GPU cluster is using the best Infiniband they can get.
WP cites the latest Infiniband at 100-1200 gigabits/second or 12-150 gigabyte/s for point to point; with Chinchilla scaling yielding models on the order of 100 gigabytes, compression & quantization cutting model size by a factor, and the ability to transmit from multiple storage devices and also send only shards to individual servers (which is how the model will probably run anyway), it is actually not out of the question for ‘downloading yourself into internet’ to be a 1-second process today.
(Not that it would matter if these numbers were 10x off. If you can’t stop a model from exfiltrating itself in 1s, then you weren’t going to somehow catch it if it actually takes 10s.)
Yes, no one is developing cutting-edge AIs like GPT-5 off your local dinky Ethernet, and your crummy home cable modem choked by your ISP is highly misleading if that’s what you think of as ‘Internet’. The real Internet is way faster, particularly in the cloud. Stuff in the datacenter can do things like access another server’s RAM using RDMA in a fraction of a millisecond, vastly faster than your PC can even talk to its hard drive. This is because datacenter networking is serious business: it’s always high-end Ethernet or better yet, Infiniband. And because interconnect is one of the most binding constraints on scaling GPU clusters, any serious GPU cluster is using the best Infiniband they can get.
WP cites the latest Infiniband at 100-1200 gigabits/second or 12-150 gigabyte/s for point to point; with Chinchilla scaling yielding models on the order of 100 gigabytes, compression & quantization cutting model size by a factor, and the ability to transmit from multiple storage devices and also send only shards to individual servers (which is how the model will probably run anyway), it is actually not out of the question for ‘downloading yourself into internet’ to be a 1-second process today.
(Not that it would matter if these numbers were 10x off. If you can’t stop a model from exfiltrating itself in 1s, then you weren’t going to somehow catch it if it actually takes 10s.)