An example of a 10+(!) year technology lead is computational discrete topology. Every large-scale geospatial, graph, et al analysis system is based on it — you can’t build one without it — but there is virtually no literature on how it works and a practical expression of the theory is robustly non-obvious. The same few people continue research and design every kernel for companies/governments. AGI and autonomous systems specifically drive much demand for this tech currently, since it is needed to reason about relationships/behaviors in space-time at scale.
There is no company behind this tech currently but I’ve heard rumors of one being created. It could have a strong feedback loop, not just due to tech exclusivity but because a platform-level implementation would effectively provide a consensus model of physical reality for machines.
Tangentially, I am aware of AGI research programs working from first principles that have made impressive theoretical CS advances while completely under the radar. It is difficult to determine if any have 3+ year leads on any other program though since that assessment implies global visibility.
Distinguishing between a technological lead and ineffective competition is also important. An example is database engine technology. Some proprietary databases are orders of magnitude more efficient/scalable than any open source comparable, which looks qualitative, but is widely recognized as a product of design quality rather than any technological lead. (see also: Google’s data infrastructure)
Distinguishing between a technological lead and ineffective competition is also important. An example is database engine technology. Some proprietary databases are orders of magnitude more efficient/scalable than any open source comparable, which looks qualitative, but is widely recognized as a product of design quality rather than any technological lead. (see also: Google’s data infrastructure)
Seems untrue to me, and I’ve benchmarked dozens of databases for dozens of problems.
In the column-store space (optimized for aggregate analytics… distributed execution of aggrgated queries, quick filtering based on ordering and data compression) Clickhouse is the best there is in my experience… I made that point 4 years ago, but now you can find plenty of other benchmarks for it. It’s used by many large scale search engines and advertisers except google, and among others, by CERN.
In wide column storage space, and more broadly in the “heavy filtering, large amounts of data space” cassandra (, facebook) and now Scylla seem to lead. I’ve never had to put dozens of petabytes in a database, but the few people that do need this seem to agree.
In the transactional space I haven’t seen anyone bring a significant gain over postgres and mariadb yet.
Kv store and in memory caching you have aerospike, rocksdb and stuff that’s based on tikv more recently… All slightly different trade-offs, all open source. I’m not even aware of proprietary products here to be honest.
Those 4 combined cover most use cases a db has.
So, not saying I’m convinced I’m correct, but could you provide some examples to back up your claims ? Name some names, or, ideally, provide some uecases/domain where one could find benchmarks that demonstrate a proprietary database gas the upper hand.
An example of a 10+(!) year technology lead is computational discrete topology. Every large-scale geospatial, graph, et al analysis system is based on it — you can’t build one without it — but there is virtually no literature on how it works and a practical expression of the theory is robustly non-obvious. The same few people continue research and design every kernel for companies/governments. AGI and autonomous systems specifically drive much demand for this tech currently, since it is needed to reason about relationships/behaviors in space-time at scale.
There is no company behind this tech currently but I’ve heard rumors of one being created. It could have a strong feedback loop, not just due to tech exclusivity but because a platform-level implementation would effectively provide a consensus model of physical reality for machines.
Tangentially, I am aware of AGI research programs working from first principles that have made impressive theoretical CS advances while completely under the radar. It is difficult to determine if any have 3+ year leads on any other program though since that assessment implies global visibility.
Distinguishing between a technological lead and ineffective competition is also important. An example is database engine technology. Some proprietary databases are orders of magnitude more efficient/scalable than any open source comparable, which looks qualitative, but is widely recognized as a product of design quality rather than any technological lead. (see also: Google’s data infrastructure)
Seems untrue to me, and I’ve benchmarked dozens of databases for dozens of problems.
In the column-store space (optimized for aggregate analytics… distributed execution of aggrgated queries, quick filtering based on ordering and data compression) Clickhouse is the best there is in my experience… I made that point 4 years ago, but now you can find plenty of other benchmarks for it. It’s used by many large scale search engines and advertisers except google, and among others, by CERN.
In wide column storage space, and more broadly in the “heavy filtering, large amounts of data space” cassandra (, facebook) and now Scylla seem to lead. I’ve never had to put dozens of petabytes in a database, but the few people that do need this seem to agree.
In the transactional space I haven’t seen anyone bring a significant gain over postgres and mariadb yet.
Kv store and in memory caching you have aerospike, rocksdb and stuff that’s based on tikv more recently… All slightly different trade-offs, all open source. I’m not even aware of proprietary products here to be honest.
Those 4 combined cover most use cases a db has.
So, not saying I’m convinced I’m correct, but could you provide some examples to back up your claims ? Name some names, or, ideally, provide some uecases/domain where one could find benchmarks that demonstrate a proprietary database gas the upper hand.