It’s easier to visualize if you try to work out the hierarchy of software agents you might use for this.
First, most of the bigger drones will probably some kind of land vehicle, whether a legged infantry or a robot on tracks. This is for obvious range and power reasons—a walking or rolling robot can carry far more weapons and armor than anything in the air. And in a battlespace where everyone on the enemy side has computer controlled aim, flying drones without armor will likely only survive for mere seconds of exposure.
So at the bottom level the drones need to be able to plan and locomote to a given location on the battlefield, or report that they are unable to reach a particular location. (Due to inaccessibility or damage to that particular unit—robotic units obviously won’t stop fighting when damaged)
At a slightly higher level you have an agent that coordinates small “units” of drones to accomplish a mission from a finite set of trained “missions”.
Missions might be things like “clear this structure of enemy fighters”.
The agent at these 2 layers have been trained with collectively millions of years, with the red team agents controlling simulated enemy drones or simulated human bodies. So the red team will likely be more combat effective at being a human than typical actual human soldiers. So the trained policy of these agents will assume their opponent is doing the best that is possible.
We don’t know what the trained policy would look like but I suspect it involves a lot of careful control of exposure angles, and various unfair strategies.
The layer above the bottom agent doesn’t know how to perceive an environment, or how to locomote, or ballistics—it queries the lower level agent whenever it needs to know if a proposed action is feasible.
Then the layer above this agent handles the battle itself, creating units of action of appropriate size and assigning missions. This is where human generals coordinate. They might choose a section of city and drag a box over it, ordering that the layer 3 agent subdivide that city section and clear every building.
Each agent must query the layer below it to function, exporting these subtasks to an agent specialized in performing them.
Even the “level 1” agent doesn’t actually directly locomote, it’s similarly tiered internally.
The actual compute hardware is hosted such that many redundant vehicles run a VM hosted copy of the agent they need and the agent a level up. Agents are perfectly deterministic—given the same input and an RNG seed they will always issue the same ‘orders’. This makes redundancy possible—multiple vehicles in parallel can run a perfect model of their ‘commander’ agent a level up, such that enemy fire destroying the vehicle that hosts a ‘commander’ will not degrade capabilities for even 1 frame.
(each ‘frame’, several ‘commanders’ broadcast their orders, taking into account information from prior frames. Each command is identical to all the others so the ‘orders’ must match or the majority will be used. So at any given time there are 3-5 sets of ‘orders’ being broadcast to all agents in this subswarm. So an incoming shot that blows up a commander, or jams it’s communication leaves plenty of redundant copies of ‘orders’)
I think you’re grossly underestimating the following effects/issues:
1. How do multiple redundant commanders ensure that they reliably have the same information, much less in a battlefield environment? Our best efforts still ended up with Bysantine faults on the space shuttle, and that was carefully designed wired connections… (see also Murphy Was an Optimist, which describes a 4-way split due to a failed diode). 2. How do commanders broadcast information in a manner that isn’t also broadcasting their location to enemies? (Honestly, the least important of these issues, and I was tempted not to include this lest you respond to this point and only this point.) 3. If many vehicles are constantly recieving enough information to make higher level decisions, how do you prevent a compromised vehicle from also leaking said state to the enemy? Note the number of known attacks against TPMs, and note that homomorphic encryption is many orders of magnitude away from being feasible here. (And worse, requires a serial speedup in many cases to be feasible.) 4. If many vehicles have the deterministic agent algorithm, how do you prevent a compromised vehicle from leaking said algorithm in a manner the enemy can use for adversarial attacks of various sorts? Same notes as 3. 5. “Each agent must query the layer below it to function, exporting these subtasks to an agent specialized in performing them.” What you’re describing runs into exponential blowup in the number of queries in some cases. (For a simple example, note that sliding-block puzzles are PSPACE-complete, and consider what happens when each bottom agent is a single block that has to be feasibility-queried as to if it can move.) Normally, I’d just say “sure, but you’re unlikely to run into those cases”, however combat is rather necessarily adversarial.
The OpenAI 5 DOTA2 bot beating professionals received a lot of press. A random team who got ten wins against said bot, not so much. Beware glass jaws.
> in a battlespace where everyone on the enemy side has computer controlled aim, flying drones without armor will likely only survive for mere seconds of exposure.
In a battlespace where everyone on the enemy side has computer controlled aim, flying drones with armor will likely only survive for mere seconds of exposure. It may be better to have smaller drones, or more maneuverable drones, or quieter drones, or simply more drones, over more armored drones. (Or it may not. The point is it’s not as clearcut as you seem to make it out to be.)
(You may wish to look at discussions of battleships, and particularly battleship armor, versus missiles. And battleships are far less weight-constrained than fliers...)
So note I do work on embedded systems IRL, and have implemented many, many variations of messaging pipeline. It is true I have not implemented one this complex, but I don’t see any showstoppers.
This is how SpaceX does it right now. In summary, it’s fine to have some of the “commanders” miss entire frames as “commanders” are stateless. Their algorithm is f([observations_this_frame|consensus_calculated_values_last_frame]). Resynchronizing when entire subnets get cut off for multiple frames and then reconnected is tricky, but straightforward. (a variation of the iterative algorithms you use for sensor fusion can fuse 2 belief spaces, aka 2 maps where each subnet has a different consensus view of a shared area of the state space)
It does, there is not a way to broadcast information that doesn’t reveal your position.
Please define TPM. Message payloads are fixed length and encrypted with a shared key. I don’t really see an issue with the enemy gaining some information because ultimately they need to have more vehicles armed with guns or they are going to lose, information does not provide much advantage.
Thermite, battery backed keystores. And the vehicles don’t have the actual source for the algorithms used to develop it, just binaries and neural network files. Assuming that the enemy can bypass the self destruct and exploit a weakness in the chip to access the key in the battery backed keystore, they just have binaries. This doesn’t give them the ability to use the technology themselves*. Moreover, the agents are using near optimal policy. A near optimal policy agent has nothing to exploit—they are not going to make any significant mistakes in battle you can learn about.
Nothing like this. The “commander” agent’s guessing from prior experience optimal configurations to put it’s troops. The “subordinate” agent it is querying runs on the same hardware node. So these requests are obviously IPC using shared memory. And the commander makes a finite number of “informed” guesses, gets from the subordinate which “plans” are impossible, and selects the best plan from the remaining (with possibly some optimizing searches in nearby state space to the current best plans). This will select a plan chosen from the set of { winning battle configurations in the current situation | possible according to subordinate } that is the best of a finite number of local maxima.
I am not sure your “glass jaw” point. OpenAI is a research startup with a prototype agent. It can’t be expected to be flawless just because it uses AI techniques. Nor do I expect these military drones to not have entire real life battles where they lose to a software bug. The difference is that the bug can be patched, records of the battle can be reviewed and learned from, and the next set of drones can learn from the fallen as directly as if any experience that was uploaded happened directly to them.
At the end there I am assuming you end up with varying sizes of land vehicle because they can carry hundreds of kilograms of armor/weapons. Flying drones do not have even in the same order of magnitude the payload capacity. So you end up with what are basically battles of attrition between land vehicles of various scales, where flying drones are instantly shot out of the sky and are used for information gathering. (a legged robot that can climb open doors and climb stairs I am classifying as a land vehicle).
Maybe it would go differently and end up being a war between artillery units at maximum range with swarms of flying drones used as spotters.
*I think this is obvious, but for every piece of binary or neural network deployed in the actual machine in the field, there is a vast set of simulators and testing tools and visualization interfaces that were needed to meaningfully work on such technology. This ‘backend’ is 99% of what you need to build these systems. If you don’t already have your own backend you can’t develop and deploy your own drones.
> In summary, it’s fine to have some of the “commanders” miss entire frames as “commanders” are stateless.
Having a single commander miss an update? Sure. That’s not really the problem. The problem is cases like “half of the commanders got update A and half didn’t, which then results in a two-way split of the commanders, which then results in agents splitting into two halves semi-randomly based on which way the majority fell of the subset of commanders that they can see”. You really should look up testing of distributed databases, because these sort of split-brain scenarios are analogous there.
You’re also currently falling afoul of the CAP theorem I believe ( https://en.wikipedia.org/wiki/CAP_theorem ). Note that “commanders receiving the same set of observations_this_frame” is equivalent to a distributed database with all observers adding observations and all commanders seeing a consistent view of this database...
> Resynchronizing when entire subnets get cut off for multiple frames and then reconnected is tricky, but straightforward
Again, you really should look up testing of distributed databases. One particularly interesting scenario is asymmetric failures. That is, A can send to B but not vice versa.
> This is how SpaceX does it right now.
Yep. Consensus among multiple redundant computations is also how the space shuttle operated (although the details are somewhat different for the space shuttle of course). It’s not perfect, but it’s a fairly decent approach so long as failures are rare enough that multiple simultaneous failures are rare, and you are not in an adversarial environment.
> “commanders” are stateless.
Commanders cannot be stateless unless either a) they do not retain memories of previously-observed world state or b) they are included in the world data every frame.
The former results in demonstrably suboptimal behavior (there’s a reason why humans have object permanence :-) ), and the latter requires that all agents be able to receive a full commander state in a single frame. (This in turn would result in your commander memory capacity being restricted by communication bandwidth.)
> Please define TPM.
My apologies. Trusted Platform Module. It’s a secure cryptoprocessor standard (and also a term for implementations of said standard, such as the ones on most x86 chips). It’s one of the more common attempts at secure computing, and has been attacked a fair few times as a result. (With more than a few successful attacks of various sorts.)
> Message payloads are fixed length and encrypted with a shared key.
Note that as soon as an attacker learns said shared key you’re now broadcasting all of your sensory information for all agents (as you have to in order to allow commanders to work as described) to the enemy. This is generally considered a Bad Idea (although you seem to disagree with this point, see below).
If you assume that information is not valuable, and that all agents broadcasting their position at all times is never harmful, I can see how this scheme of just broadcasting everything all the time can be useful. I disagree with the premise, however.
> I don’t really see an issue with the enemy gaining some information because ultimately they need to have more vehicles armed with guns or they are going to lose, information does not provide much advantage.
This is incorrect from a game-theoretical point of view. A single example is matching pennies ( https://en.wikipedia.org/wiki/Matching_pennies ). Normally the second person can at most break even on average. However, they can always win if they knew what the first person’s penny is...
(As to how this maps to warfare? You might try playing a few games of Stratego ( https://en.wikipedia.org/wiki/Stratego ). Much of the game is the same sort of “reinforcements are moving to A and B but I don’t know which is the actual reinforcement and which is a feint quite yet” decisions.)
For a slightly more concrete example: I’ll quite happily play you a game of chess where I start without a queen and you have no knowledge of where any of my pieces are. (That is, you try a move, and if it’s legal that’s the move that’s taken. You do get knowledge of which pieces you have captured.)
> Moreover, the agents are using near optimal policy.
This is an interesting assertion. Do you have any citations showing that self-training results in near optimal policies in complex environments? In particular policies that remain near optimal in adversarial cases?
> This will select a plan chosen from the set of { winning battle configurations in the current situation | possible according to subordinate } that is the best of a finite number of local maxima.
There are a fair few optimization problems that you can run into that are proven to not have a polynomial-time approximation scheme unless P=NP (e.g. set covering).
As you can construct physical scenarios that map to said problems, this means that either:
1. You’ve solved P=NP, or are assuming that it is constructively proven that P=NP before this scenario happens. 2. You’re requiring that your commanders potentially do exp(num agents) work per timestep. 3. Your agents are not using near-optimal policies. 4. Your agents are using quantum computing, and using an algorithm within BQP ( https://en.wikipedia.org/wiki/BQP ). 5. You are unaware of this result.
2 is typically physically impossible for more than a few agents, for the same reasons that symmetric-key cryptography can be secure.
> The difference is that the bug can be patched
...assuming you haven’t already lost the war as a result. If the other side manages to take out 50% of your drones at once when you were previously at par, that’s a problem. Saying “we’ll get better for next time” only works if you’re still in a winnable position.
The main issue I am drawing attention to here when I talk about glass jaws is correlated failure modes in adversarial environments, in particular ones that can result in system-level catastrophic failure. If 10% of your drones fail on average, you can plan around that. If your critical holdpoints are suddenly all catastrophically lost due to the same silly bug or edge case, not so much.
> And the vehicles don’t have the actual source for the algorithms used to develop it, just binaries and neural network files.
This doesn’t actually matter all that much for adversarial attacks. Look at the cat and mouse game that is Denuvo for the binary side of things (and note that one of the major techniques used now is to run important code on a remote trusted server, which is not something that the techniques you describe does), and note that if you have a (trainable) neural network you can differentiate it, which is all you need for adversarial attacks.
It might be interesting to discuss this in a more interactive format, such as on https://discord.gg/GVkQF2Wn . You do know some stuff, I know some stuff, and we seem to be talking past each other. Fundamentally I think these problems are solvable.
(1) merger of conflicting world spaces is possible. or if this turns out to be too complex to implement, you deterministically pick one network to be the primary one, and have it load from the subordinate network the current observations.
(3) Free air laser links is one technology that would at least somewhat obscure the source of the signaling (laser light will probably reflect in a detectable way around corners but it won’t go through solid objects) and is capable of tends of gigabits per second of bandwidth, enough to satisfy some of your concerns.
I’m not the Mailman, but I’m up there. I tend to write out a sketch, then go back and ponder it a while, massaging it into some semblance of order, deleting/modifying arguments that in retrospect don’t work, and inflating it out into a quasi-coherent post in the process. This takes a fair bit of time. It works well in an asynchronous context. It does not work well in a synchronous context. In my experience, when I attempt to discuss in a synchronous context I end up with one of the following two things (or both!):
1. I state arguments or views that are insufficiently thought-out and that are obviously incorrect/inconsistent in retrospect, or are misleading/confusing/weaker than they should be. 2. I end up with essentially just a forum discussion that happens to be on Discord. Walls of text and all.
The 2nd would be fine, but this then runs into another issue:
Much of the reason why I am on a website like this is so that people can follow arguments / point out issues with my views / etc. Partly for the later benefit of others following my chains of logic. Partly for the later benefit of others when they can refer back to my chains of logic. Partly for the later benefit of myself, when someone down the line sees an old comment of mine and replies with something I hadn’t thought of. And partly for the later benefit of myself, when I can refer back to my chains of logic.
Discord does not achieve these.
Someone searching this site does not see a Discord conversation. If you (or whoever owns the room, rather) close the Discord room, then the information is lost. (Or if e.g. Discord decides 6m down the line to start dropping old conversation history, etc, etc.) You can, somewhat awkwardly, archive a Discord conversation. And, say, post it on this site. But that’s now just a forum conversation with extra steps (not to mention that it’s now associated with the person who posted the transcript, not the people in the transcript. If you post the transcript and someone replies to a comment of mine in it, I don’t get a notification.).
(2) If commanders need more memory than the communications channel, they must exchange deltas.
You previously stated that ‘”commanders” are stateless.’. Do commanders have state here?
If commanders are stateless, this technique does not work as they have nothing to base the deltas on.
If commanders are stateful, and are maintaining a worldstate by deterministically applying deltas… you’re right back to CAP theorem limitations. Pick at least one of a) diverging worldstates in the presence of network partitions, b) the (bad) assumption of no network partitions, and/or c) arbitrarily-long stalls in the presence of network partitions.
(AoE falls under c) here, if you’re wondering. In networked gaming, sacrificing consistency in the presence of a network partition is called a desync and is a Bad Thing(TM).)
(1) merger of conflicting world spaces is possible.
Sure. This is just eventual consistency, restated. With all of the wrinkles that eventually-consistent distributed databases have.
To go back to your game example for a moment. You’ve got a 2-on-1 match—AB versus X.
A, B and X each have an army. X’s army wins ties. So in a fight between X and A or X and B, X wins. But in a fight between X and AB, X loses. X’s army is faster. If both side’s bases are destroyed, X wins, as X destroys AB’s bases first. X has three options: defend, or attack through one of two chokepoints—path P, that A has info on, and path Q, that B has info on. Ditto, A and B each have three options: defend P, defend Q, or attack. X went eco-heavy, and so AB will lose if they neither manage to destroy X’s base nor destroy X’s army.
This is essentially a coordination game of sorts, with an additional wrinkle that neither A nor B has the full picture of what’s going on.
In the presence of reliable prompt communication between A and B, this is a reliable win for AB. AB relay information on P and Q to each other, and both check paths P and Q. If P or Q has X’s army, A and B send both armies there and destroys it. Otherwise, X’s army must be defending and A and B send both armies to X’s base, destroying it.
Now let’s say that instead the network connection is lost between A and B. A checks and sees that P does not have X’s army, but has no info on Q. This means that A must either send its army to Q, or X. But which one? It could be either. Say it sends it to X.
B checks and sees that Q does have X’s army, but has no info on P (not that it matters in this case). This means that B must send its army to Q, so it does.
Some time later, the network comes back online. A and B consolidate their world-state just fine. Annddd… A’s army is attacking instead of defending Q, and B’s army and then their bases are dying in the meantime, and they lose. X has a 50% chance of winning in this simply by making a random choice between X and P, if communication is disrupted at said critical point. (Or X and Q. Either works.)
Eventual consistency is not enough.
(3) Free air laser links is one technology that would at least somewhat obscure the source of the signaling (laser light will probably reflect in a detectable way around corners but it won’t go through solid objects) and is capable of tends of gigabits per second of bandwidth, enough to satisfy some of your concerns.
Laser links are wonderful when a) they are through clear air and b) you have a stable alignment between the two endpoints. They fall apart (in the sense of hilariously low channel capacity) in the presence of turbulence / fog / rain / snow / smog / dust / smoke / physical obstructions, or when the endpoints are changing alignment. For many applications this is fine. For rapidly-moving drones in a battlefield environment, where if the enemy knows that they can disrupt you at a critical moment with a smokebomb they absolutely will, not so much. (To an extent you can compensate for some of this by upping the laser power and adding more elaborate beam tracking of various sorts… but a) a small flying craft doesn’t exactly have spare energy or mass, and b) you now have enough scattering that the beam is easily detectable.) (Note that it’s not just attenuation that’s the issue. It’s also dispersion.)
(You can kind of get away with unstable but predictable and smooth alignment, e.g. tracking satellites. But the equipment for this is not exactly easily fitted on a small drone, and drone movement in a battlefield environment is not exactly smooth.)
Oh, and laser links are also point-to-point, which increases effective latency compared to a broadcast system (as there’s a limited number of transmitters and receivers on any one craft, the current leader cannot directly receive updates from everyone, even if there’s the bandwidth available. It has to be bounced/consolidated through relays, which adds latency).
So TLW, at the end of the day, all your objections are in the form of “this method isn’t perfect” or “this method will have issues that are fundamental theorems”.
And you’re right. I’m taking the perspective of, having built smaller scale versions of networked control systems, using a slightly lossy interface and an atomic state update mechanism, “we can make this work”.
I guess that’s the delta here. Everything you say as an objection is correct. It’s just not sufficient.
At the end of the day, we’re talking about a collection of vehicles. Each is somewhere between the size of a main battle tank and a human sized machine that can open doors and climb stairs. All likely use fuel cells for power. All have racks of compute boards, likely arm based SOCs, likely using TPUs or on-die coprocessors. Hosted on these boards is a software stack. It is very complex but at a simple level it does :
perception → state state representation → potential action set → H(potential action set) → max(H) → actuators.
That H, how it evaluates a potential action, takes into account (estimates of loss, isActionAllowed, gain_estimate(mission_parameters), gain_estimate(orders)).
It will not take an action if not allowed. (example, if weapons disabled it will not plan to use them). It will avoid actions with predicted loss unless the gain is high enough. (example it won’t normally jump out a window but if $HIGH_VALUE_TARGET is escaping around the corner, the machine should and will jump out a window, firing in midair before it is mission killed on impact, when the heuristic is tuned right)
So each machine is fighting on it’s own, able to kill enemy fighters on it’s own, assassinate VIPs, avoid firing on civilians, unless the reward is high enough [it will fire through civilians if the predicted gain is set high enough. These machines are of course amoral and human operators setting “accomplish at all costs” for a goal’s priority will cause many casualties].
The coordination layer is small in data, except for maybe map updates. Basically the “commanders” are nodes that run in every machine, they all share software components where the actual functional block is ‘stateless’ as mentioned. Just because there is a database with cached state and you send (delta, hash) each frame in no way invalidates this design. What stateless means is that the “commander” gets (data from last frame, new information) and will make a decision based only on the arguments. At an OS level this is just a binary running in it’s own process space that after each frame, it’s own memory is in the same state it started in. [it wrote the outputs to shared memory, having read the inputs from read only memory]
This is necessary if you want to have multiple computer redundancy, or software you can even debug. FYI I actually do this, this part’s present day.
Anyways in situations where the “commander” doesn’t work for any of the reasons you mention...this doesn’t change a whole lot. Each machine is now just fighting on it’s own or in a smaller group for a while. They still have their last orders.
If comm losses are common and you have a much larger network, the form you issue orders in—that limits the autonomy of ever smaller subunits—might be a little interesting.
I think I have updated a little bit. From thinking about this problem, I do agree that you need the software stacks to be highly robust to network link losses, breaking into smaller units, momentary rejoins not sufficient to send map updates, and so on. This would be a lot of effort and would take years of architecture iteration and testing. There are some amusing bugs you might get, such as one small subunit having seen an enemy fighter sneak by in the past, then when the units resync with each other, fail to report this because the sync algorithm flushes anything not relevant to the immediate present state and objectives.
It’s easier to visualize if you try to work out the hierarchy of software agents you might use for this.
First, most of the bigger drones will probably some kind of land vehicle, whether a legged infantry or a robot on tracks. This is for obvious range and power reasons—a walking or rolling robot can carry far more weapons and armor than anything in the air. And in a battlespace where everyone on the enemy side has computer controlled aim, flying drones without armor will likely only survive for mere seconds of exposure.
So at the bottom level the drones need to be able to plan and locomote to a given location on the battlefield, or report that they are unable to reach a particular location. (Due to inaccessibility or damage to that particular unit—robotic units obviously won’t stop fighting when damaged)
At a slightly higher level you have an agent that coordinates small “units” of drones to accomplish a mission from a finite set of trained “missions”.
Missions might be things like “clear this structure of enemy fighters”.
The agent at these 2 layers have been trained with collectively millions of years, with the red team agents controlling simulated enemy drones or simulated human bodies. So the red team will likely be more combat effective at being a human than typical actual human soldiers. So the trained policy of these agents will assume their opponent is doing the best that is possible.
We don’t know what the trained policy would look like but I suspect it involves a lot of careful control of exposure angles, and various unfair strategies.
The layer above the bottom agent doesn’t know how to perceive an environment, or how to locomote, or ballistics—it queries the lower level agent whenever it needs to know if a proposed action is feasible.
Then the layer above this agent handles the battle itself, creating units of action of appropriate size and assigning missions. This is where human generals coordinate. They might choose a section of city and drag a box over it, ordering that the layer 3 agent subdivide that city section and clear every building.
Each agent must query the layer below it to function, exporting these subtasks to an agent specialized in performing them.
Even the “level 1” agent doesn’t actually directly locomote, it’s similarly tiered internally.
The actual compute hardware is hosted such that many redundant vehicles run a VM hosted copy of the agent they need and the agent a level up. Agents are perfectly deterministic—given the same input and an RNG seed they will always issue the same ‘orders’. This makes redundancy possible—multiple vehicles in parallel can run a perfect model of their ‘commander’ agent a level up, such that enemy fire destroying the vehicle that hosts a ‘commander’ will not degrade capabilities for even 1 frame.
(each ‘frame’, several ‘commanders’ broadcast their orders, taking into account information from prior frames. Each command is identical to all the others so the ‘orders’ must match or the majority will be used. So at any given time there are 3-5 sets of ‘orders’ being broadcast to all agents in this subswarm. So an incoming shot that blows up a commander, or jams it’s communication leaves plenty of redundant copies of ‘orders’)
I think you’re grossly underestimating the following effects/issues:
1. How do multiple redundant commanders ensure that they reliably have the same information, much less in a battlefield environment? Our best efforts still ended up with Bysantine faults on the space shuttle, and that was carefully designed wired connections… (see also Murphy Was an Optimist, which describes a 4-way split due to a failed diode).
2. How do commanders broadcast information in a manner that isn’t also broadcasting their location to enemies? (Honestly, the least important of these issues, and I was tempted not to include this lest you respond to this point and only this point.)
3. If many vehicles are constantly recieving enough information to make higher level decisions, how do you prevent a compromised vehicle from also leaking said state to the enemy? Note the number of known attacks against TPMs, and note that homomorphic encryption is many orders of magnitude away from being feasible here. (And worse, requires a serial speedup in many cases to be feasible.)
4. If many vehicles have the deterministic agent algorithm, how do you prevent a compromised vehicle from leaking said algorithm in a manner the enemy can use for adversarial attacks of various sorts? Same notes as 3.
5. “Each agent must query the layer below it to function, exporting these subtasks to an agent specialized in performing them.” What you’re describing runs into exponential blowup in the number of queries in some cases. (For a simple example, note that sliding-block puzzles are PSPACE-complete, and consider what happens when each bottom agent is a single block that has to be feasibility-queried as to if it can move.) Normally, I’d just say “sure, but you’re unlikely to run into those cases”, however combat is rather necessarily adversarial.
The OpenAI 5 DOTA2 bot beating professionals received a lot of press. A random team who got ten wins against said bot, not so much. Beware glass jaws.
> in a battlespace where everyone on the enemy side has computer controlled aim, flying drones without armor will likely only survive for mere seconds of exposure.
In a battlespace where everyone on the enemy side has computer controlled aim, flying drones with armor will likely only survive for mere seconds of exposure. It may be better to have smaller drones, or more maneuverable drones, or quieter drones, or simply more drones, over more armored drones. (Or it may not. The point is it’s not as clearcut as you seem to make it out to be.)
(You may wish to look at discussions of battleships, and particularly battleship armor, versus missiles. And battleships are far less weight-constrained than fliers...)
So note I do work on embedded systems IRL, and have implemented many, many variations of messaging pipeline. It is true I have not implemented one this complex, but I don’t see any showstoppers.
This is how SpaceX does it right now. In summary, it’s fine to have some of the “commanders” miss entire frames as “commanders” are stateless. Their algorithm is f([observations_this_frame|consensus_calculated_values_last_frame]). Resynchronizing when entire subnets get cut off for multiple frames and then reconnected is tricky, but straightforward. (a variation of the iterative algorithms you use for sensor fusion can fuse 2 belief spaces, aka 2 maps where each subnet has a different consensus view of a shared area of the state space)
It does, there is not a way to broadcast information that doesn’t reveal your position.
Please define TPM. Message payloads are fixed length and encrypted with a shared key. I don’t really see an issue with the enemy gaining some information because ultimately they need to have more vehicles armed with guns or they are going to lose, information does not provide much advantage.
Thermite, battery backed keystores. And the vehicles don’t have the actual source for the algorithms used to develop it, just binaries and neural network files. Assuming that the enemy can bypass the self destruct and exploit a weakness in the chip to access the key in the battery backed keystore, they just have binaries. This doesn’t give them the ability to use the technology themselves*. Moreover, the agents are using near optimal policy. A near optimal policy agent has nothing to exploit—they are not going to make any significant mistakes in battle you can learn about.
Nothing like this. The “commander” agent’s guessing from prior experience optimal configurations to put it’s troops. The “subordinate” agent it is querying runs on the same hardware node. So these requests are obviously IPC using shared memory. And the commander makes a finite number of “informed” guesses, gets from the subordinate which “plans” are impossible, and selects the best plan from the remaining (with possibly some optimizing searches in nearby state space to the current best plans). This will select a plan chosen from the set of { winning battle configurations in the current situation | possible according to subordinate } that is the best of a finite number of local maxima.
I am not sure your “glass jaw” point. OpenAI is a research startup with a prototype agent. It can’t be expected to be flawless just because it uses AI techniques. Nor do I expect these military drones to not have entire real life battles where they lose to a software bug. The difference is that the bug can be patched, records of the battle can be reviewed and learned from, and the next set of drones can learn from the fallen as directly as if any experience that was uploaded happened directly to them.
At the end there I am assuming you end up with varying sizes of land vehicle because they can carry hundreds of kilograms of armor/weapons. Flying drones do not have even in the same order of magnitude the payload capacity. So you end up with what are basically battles of attrition between land vehicles of various scales, where flying drones are instantly shot out of the sky and are used for information gathering. (a legged robot that can climb open doors and climb stairs I am classifying as a land vehicle).
Maybe it would go differently and end up being a war between artillery units at maximum range with swarms of flying drones used as spotters.
*I think this is obvious, but for every piece of binary or neural network deployed in the actual machine in the field, there is a vast set of simulators and testing tools and visualization interfaces that were needed to meaningfully work on such technology. This ‘backend’ is 99% of what you need to build these systems. If you don’t already have your own backend you can’t develop and deploy your own drones.
> In summary, it’s fine to have some of the “commanders” miss entire frames as “commanders” are stateless.
Having a single commander miss an update? Sure. That’s not really the problem. The problem is cases like “half of the commanders got update A and half didn’t, which then results in a two-way split of the commanders, which then results in agents splitting into two halves semi-randomly based on which way the majority fell of the subset of commanders that they can see”. You really should look up testing of distributed databases, because these sort of split-brain scenarios are analogous there.
You’re also currently falling afoul of the CAP theorem I believe ( https://en.wikipedia.org/wiki/CAP_theorem ). Note that “commanders receiving the same set of observations_this_frame” is equivalent to a distributed database with all observers adding observations and all commanders seeing a consistent view of this database...
> Resynchronizing when entire subnets get cut off for multiple frames and then reconnected is tricky, but straightforward
Again, you really should look up testing of distributed databases. One particularly interesting scenario is asymmetric failures. That is, A can send to B but not vice versa.
> This is how SpaceX does it right now.
Yep. Consensus among multiple redundant computations is also how the space shuttle operated (although the details are somewhat different for the space shuttle of course). It’s not perfect, but it’s a fairly decent approach so long as failures are rare enough that multiple simultaneous failures are rare, and you are not in an adversarial environment.
> “commanders” are stateless.
Commanders cannot be stateless unless either a) they do not retain memories of previously-observed world state or b) they are included in the world data every frame.
The former results in demonstrably suboptimal behavior (there’s a reason why humans have object permanence :-) ), and the latter requires that all agents be able to receive a full commander state in a single frame. (This in turn would result in your commander memory capacity being restricted by communication bandwidth.)
> Please define TPM.
My apologies. Trusted Platform Module. It’s a secure cryptoprocessor standard (and also a term for implementations of said standard, such as the ones on most x86 chips). It’s one of the more common attempts at secure computing, and has been attacked a fair few times as a result. (With more than a few successful attacks of various sorts.)
> Message payloads are fixed length and encrypted with a shared key.
Note that as soon as an attacker learns said shared key you’re now broadcasting all of your sensory information for all agents (as you have to in order to allow commanders to work as described) to the enemy. This is generally considered a Bad Idea (although you seem to disagree with this point, see below).
If you assume that information is not valuable, and that all agents broadcasting their position at all times is never harmful, I can see how this scheme of just broadcasting everything all the time can be useful. I disagree with the premise, however.
> I don’t really see an issue with the enemy gaining some information because ultimately they need to have more vehicles armed with guns or they are going to lose, information does not provide much advantage.
This is incorrect from a game-theoretical point of view. A single example is matching pennies ( https://en.wikipedia.org/wiki/Matching_pennies ). Normally the second person can at most break even on average. However, they can always win if they knew what the first person’s penny is...
(As to how this maps to warfare? You might try playing a few games of Stratego ( https://en.wikipedia.org/wiki/Stratego ). Much of the game is the same sort of “reinforcements are moving to A and B but I don’t know which is the actual reinforcement and which is a feint quite yet” decisions.)
For a slightly more concrete example: I’ll quite happily play you a game of chess where I start without a queen and you have no knowledge of where any of my pieces are. (That is, you try a move, and if it’s legal that’s the move that’s taken. You do get knowledge of which pieces you have captured.)
Starting without a queen is a significant disadvantage in chess, so according to your assertion this should be easy to win for you. (A random online stockfish engine gives +1260 centipawns, and compare e.g. here https://chesscomputer.tumblr.com/post/98632536555/using-the-stockfish-position-evaluation-score-to where even +500 centipawns gives a win % of >90%.)
> Moreover, the agents are using near optimal policy.
This is an interesting assertion. Do you have any citations showing that self-training results in near optimal policies in complex environments? In particular policies that remain near optimal in adversarial cases?
> This will select a plan chosen from the set of { winning battle configurations in the current situation | possible according to subordinate } that is the best of a finite number of local maxima.
There are a fair few optimization problems that you can run into that are proven to not have a polynomial-time approximation scheme unless P=NP (e.g. set covering).
As you can construct physical scenarios that map to said problems, this means that either:
1. You’ve solved P=NP, or are assuming that it is constructively proven that P=NP before this scenario happens.
2. You’re requiring that your commanders potentially do exp(num agents) work per timestep.
3. Your agents are not using near-optimal policies.
4. Your agents are using quantum computing, and using an algorithm within BQP ( https://en.wikipedia.org/wiki/BQP ).
5. You are unaware of this result.
2 is typically physically impossible for more than a few agents, for the same reasons that symmetric-key cryptography can be secure.
> The difference is that the bug can be patched
...assuming you haven’t already lost the war as a result. If the other side manages to take out 50% of your drones at once when you were previously at par, that’s a problem. Saying “we’ll get better for next time” only works if you’re still in a winnable position.
The main issue I am drawing attention to here when I talk about glass jaws is correlated failure modes in adversarial environments, in particular ones that can result in system-level catastrophic failure. If 10% of your drones fail on average, you can plan around that. If your critical holdpoints are suddenly all catastrophically lost due to the same silly bug or edge case, not so much.
> And the vehicles don’t have the actual source for the algorithms used to develop it, just binaries and neural network files.
This doesn’t actually matter all that much for adversarial attacks. Look at the cat and mouse game that is Denuvo for the binary side of things (and note that one of the major techniques used now is to run important code on a remote trusted server, which is not something that the techniques you describe does), and note that if you have a (trainable) neural network you can differentiate it, which is all you need for adversarial attacks.
(Actually, you don’t even need that. See e.g. https://arxiv.org/abs/2003.04884 )
And yes, there are countermeasures. But there are also countercountermeasures, and so on. It’s a cat-and-mouse game, not a clearcut ‘defender wins’.
It might be interesting to discuss this in a more interactive format, such as on https://discord.gg/GVkQF2Wn . You do know some stuff, I know some stuff, and we seem to be talking past each other. Fundamentally I think these problems are solvable.
(1) merger of conflicting world spaces is possible. or if this turns out to be too complex to implement, you deterministically pick one network to be the primary one, and have it load from the subordinate network the current observations.
(2) If commanders need more memory than the communications channel, they must exchange deltas. These deltas are the (common observations, made as input from the subordinate platforms, and state metadata). This is how a complex simulation like age of empires worked on a modem link.https://www.gamedeveloper.com/programming/1500-archers-on-a-28-8-network-programming-in-age-of-empires-and-beyond
(3) Free air laser links is one technology that would at least somewhat obscure the source of the signaling (laser light will probably reflect in a detectable way around corners but it won’t go through solid objects) and is capable of tends of gigabits per second of bandwidth, enough to satisfy some of your concerns.
I’m not the Mailman, but I’m up there. I tend to write out a sketch, then go back and ponder it a while, massaging it into some semblance of order, deleting/modifying arguments that in retrospect don’t work, and inflating it out into a quasi-coherent post in the process. This takes a fair bit of time. It works well in an asynchronous context. It does not work well in a synchronous context. In my experience, when I attempt to discuss in a synchronous context I end up with one of the following two things (or both!):
1. I state arguments or views that are insufficiently thought-out and that are obviously incorrect/inconsistent in retrospect, or are misleading/confusing/weaker than they should be.
2. I end up with essentially just a forum discussion that happens to be on Discord. Walls of text and all.
The 2nd would be fine, but this then runs into another issue:
Much of the reason why I am on a website like this is so that people can follow arguments / point out issues with my views / etc. Partly for the later benefit of others following my chains of logic. Partly for the later benefit of others when they can refer back to my chains of logic. Partly for the later benefit of myself, when someone down the line sees an old comment of mine and replies with something I hadn’t thought of. And partly for the later benefit of myself, when I can refer back to my chains of logic.
Discord does not achieve these.
Someone searching this site does not see a Discord conversation. If you (or whoever owns the room, rather) close the Discord room, then the information is lost. (Or if e.g. Discord decides 6m down the line to start dropping old conversation history, etc, etc.) You can, somewhat awkwardly, archive a Discord conversation. And, say, post it on this site. But that’s now just a forum conversation with extra steps (not to mention that it’s now associated with the person who posted the transcript, not the people in the transcript. If you post the transcript and someone replies to a comment of mine in it, I don’t get a notification.).
You previously stated that ‘”commanders” are stateless.’. Do commanders have state here?
If commanders are stateless, this technique does not work as they have nothing to base the deltas on.
If commanders are stateful, and are maintaining a worldstate by deterministically applying deltas… you’re right back to CAP theorem limitations. Pick at least one of a) diverging worldstates in the presence of network partitions, b) the (bad) assumption of no network partitions, and/or c) arbitrarily-long stalls in the presence of network partitions.
(AoE falls under c) here, if you’re wondering. In networked gaming, sacrificing consistency in the presence of a network partition is called a desync and is a Bad Thing(TM).)
Sure. This is just eventual consistency, restated. With all of the wrinkles that eventually-consistent distributed databases have.
To go back to your game example for a moment. You’ve got a 2-on-1 match—AB versus X.
A, B and X each have an army.
X’s army wins ties. So in a fight between X and A or X and B, X wins. But in a fight between X and AB, X loses.
X’s army is faster. If both side’s bases are destroyed, X wins, as X destroys AB’s bases first.
X has three options: defend, or attack through one of two chokepoints—path P, that A has info on, and path Q, that B has info on.
Ditto, A and B each have three options: defend P, defend Q, or attack.
X went eco-heavy, and so AB will lose if they neither manage to destroy X’s base nor destroy X’s army.
This is essentially a coordination game of sorts, with an additional wrinkle that neither A nor B has the full picture of what’s going on.
In the presence of reliable prompt communication between A and B, this is a reliable win for AB. AB relay information on P and Q to each other, and both check paths P and Q. If P or Q has X’s army, A and B send both armies there and destroys it. Otherwise, X’s army must be defending and A and B send both armies to X’s base, destroying it.
Now let’s say that instead the network connection is lost between A and B. A checks and sees that P does not have X’s army, but has no info on Q. This means that A must either send its army to Q, or X. But which one? It could be either. Say it sends it to X.
B checks and sees that Q does have X’s army, but has no info on P (not that it matters in this case). This means that B must send its army to Q, so it does.
Some time later, the network comes back online. A and B consolidate their world-state just fine. Annddd… A’s army is attacking instead of defending Q, and B’s army and then their bases are dying in the meantime, and they lose. X has a 50% chance of winning in this simply by making a random choice between X and P, if communication is disrupted at said critical point. (Or X and Q. Either works.)
Eventual consistency is not enough.
Laser links are wonderful when a) they are through clear air and b) you have a stable alignment between the two endpoints. They fall apart (in the sense of hilariously low channel capacity) in the presence of turbulence / fog / rain / snow / smog / dust / smoke / physical obstructions, or when the endpoints are changing alignment. For many applications this is fine. For rapidly-moving drones in a battlefield environment, where if the enemy knows that they can disrupt you at a critical moment with a smokebomb they absolutely will, not so much. (To an extent you can compensate for some of this by upping the laser power and adding more elaborate beam tracking of various sorts… but a) a small flying craft doesn’t exactly have spare energy or mass, and b) you now have enough scattering that the beam is easily detectable.) (Note that it’s not just attenuation that’s the issue. It’s also dispersion.)
(You can kind of get away with unstable but predictable and smooth alignment, e.g. tracking satellites. But the equipment for this is not exactly easily fitted on a small drone, and drone movement in a battlefield environment is not exactly smooth.)
Oh, and laser links are also point-to-point, which increases effective latency compared to a broadcast system (as there’s a limited number of transmitters and receivers on any one craft, the current leader cannot directly receive updates from everyone, even if there’s the bandwidth available. It has to be bounced/consolidated through relays, which adds latency).
So TLW, at the end of the day, all your objections are in the form of “this method isn’t perfect” or “this method will have issues that are fundamental theorems”.
And you’re right. I’m taking the perspective of, having built smaller scale versions of networked control systems, using a slightly lossy interface and an atomic state update mechanism, “we can make this work”.
I guess that’s the delta here. Everything you say as an objection is correct. It’s just not sufficient.
At the end of the day, we’re talking about a collection of vehicles. Each is somewhere between the size of a main battle tank and a human sized machine that can open doors and climb stairs. All likely use fuel cells for power. All have racks of compute boards, likely arm based SOCs, likely using TPUs or on-die coprocessors. Hosted on these boards is a software stack. It is very complex but at a simple level it does :
perception → state state representation → potential action set → H(potential action set) → max(H) → actuators.
That H, how it evaluates a potential action, takes into account (estimates of loss, isActionAllowed, gain_estimate(mission_parameters), gain_estimate(orders)).
It will not take an action if not allowed. (example, if weapons disabled it will not plan to use them). It will avoid actions with predicted loss unless the gain is high enough. (example it won’t normally jump out a window but if $HIGH_VALUE_TARGET is escaping around the corner, the machine should and will jump out a window, firing in midair before it is mission killed on impact, when the heuristic is tuned right)
So each machine is fighting on it’s own, able to kill enemy fighters on it’s own, assassinate VIPs, avoid firing on civilians, unless the reward is high enough [it will fire through civilians if the predicted gain is set high enough. These machines are of course amoral and human operators setting “accomplish at all costs” for a goal’s priority will cause many casualties].
The coordination layer is small in data, except for maybe map updates. Basically the “commanders” are nodes that run in every machine, they all share software components where the actual functional block is ‘stateless’ as mentioned. Just because there is a database with cached state and you send (delta, hash) each frame in no way invalidates this design. What stateless means is that the “commander” gets (data from last frame, new information) and will make a decision based only on the arguments. At an OS level this is just a binary running in it’s own process space that after each frame, it’s own memory is in the same state it started in. [it wrote the outputs to shared memory, having read the inputs from read only memory]
This is necessary if you want to have multiple computer redundancy, or software you can even debug. FYI I actually do this, this part’s present day.
Anyways in situations where the “commander” doesn’t work for any of the reasons you mention...this doesn’t change a whole lot. Each machine is now just fighting on it’s own or in a smaller group for a while. They still have their last orders.
If comm losses are common and you have a much larger network, the form you issue orders in—that limits the autonomy of ever smaller subunits—might be a little interesting.
I think I have updated a little bit. From thinking about this problem, I do agree that you need the software stacks to be highly robust to network link losses, breaking into smaller units, momentary rejoins not sufficient to send map updates, and so on. This would be a lot of effort and would take years of architecture iteration and testing. There are some amusing bugs you might get, such as one small subunit having seen an enemy fighter sneak by in the past, then when the units resync with each other, fail to report this because the sync algorithm flushes anything not relevant to the immediate present state and objectives.