I’m not the Mailman, but I’m up there. I tend to write out a sketch, then go back and ponder it a while, massaging it into some semblance of order, deleting/modifying arguments that in retrospect don’t work, and inflating it out into a quasi-coherent post in the process. This takes a fair bit of time. It works well in an asynchronous context. It does not work well in a synchronous context. In my experience, when I attempt to discuss in a synchronous context I end up with one of the following two things (or both!):
1. I state arguments or views that are insufficiently thought-out and that are obviously incorrect/inconsistent in retrospect, or are misleading/confusing/weaker than they should be. 2. I end up with essentially just a forum discussion that happens to be on Discord. Walls of text and all.
The 2nd would be fine, but this then runs into another issue:
Much of the reason why I am on a website like this is so that people can follow arguments / point out issues with my views / etc. Partly for the later benefit of others following my chains of logic. Partly for the later benefit of others when they can refer back to my chains of logic. Partly for the later benefit of myself, when someone down the line sees an old comment of mine and replies with something I hadn’t thought of. And partly for the later benefit of myself, when I can refer back to my chains of logic.
Discord does not achieve these.
Someone searching this site does not see a Discord conversation. If you (or whoever owns the room, rather) close the Discord room, then the information is lost. (Or if e.g. Discord decides 6m down the line to start dropping old conversation history, etc, etc.) You can, somewhat awkwardly, archive a Discord conversation. And, say, post it on this site. But that’s now just a forum conversation with extra steps (not to mention that it’s now associated with the person who posted the transcript, not the people in the transcript. If you post the transcript and someone replies to a comment of mine in it, I don’t get a notification.).
(2) If commanders need more memory than the communications channel, they must exchange deltas.
You previously stated that ‘”commanders” are stateless.’. Do commanders have state here?
If commanders are stateless, this technique does not work as they have nothing to base the deltas on.
If commanders are stateful, and are maintaining a worldstate by deterministically applying deltas… you’re right back to CAP theorem limitations. Pick at least one of a) diverging worldstates in the presence of network partitions, b) the (bad) assumption of no network partitions, and/or c) arbitrarily-long stalls in the presence of network partitions.
(AoE falls under c) here, if you’re wondering. In networked gaming, sacrificing consistency in the presence of a network partition is called a desync and is a Bad Thing(TM).)
(1) merger of conflicting world spaces is possible.
Sure. This is just eventual consistency, restated. With all of the wrinkles that eventually-consistent distributed databases have.
To go back to your game example for a moment. You’ve got a 2-on-1 match—AB versus X.
A, B and X each have an army. X’s army wins ties. So in a fight between X and A or X and B, X wins. But in a fight between X and AB, X loses. X’s army is faster. If both side’s bases are destroyed, X wins, as X destroys AB’s bases first. X has three options: defend, or attack through one of two chokepoints—path P, that A has info on, and path Q, that B has info on. Ditto, A and B each have three options: defend P, defend Q, or attack. X went eco-heavy, and so AB will lose if they neither manage to destroy X’s base nor destroy X’s army.
This is essentially a coordination game of sorts, with an additional wrinkle that neither A nor B has the full picture of what’s going on.
In the presence of reliable prompt communication between A and B, this is a reliable win for AB. AB relay information on P and Q to each other, and both check paths P and Q. If P or Q has X’s army, A and B send both armies there and destroys it. Otherwise, X’s army must be defending and A and B send both armies to X’s base, destroying it.
Now let’s say that instead the network connection is lost between A and B. A checks and sees that P does not have X’s army, but has no info on Q. This means that A must either send its army to Q, or X. But which one? It could be either. Say it sends it to X.
B checks and sees that Q does have X’s army, but has no info on P (not that it matters in this case). This means that B must send its army to Q, so it does.
Some time later, the network comes back online. A and B consolidate their world-state just fine. Annddd… A’s army is attacking instead of defending Q, and B’s army and then their bases are dying in the meantime, and they lose. X has a 50% chance of winning in this simply by making a random choice between X and P, if communication is disrupted at said critical point. (Or X and Q. Either works.)
Eventual consistency is not enough.
(3) Free air laser links is one technology that would at least somewhat obscure the source of the signaling (laser light will probably reflect in a detectable way around corners but it won’t go through solid objects) and is capable of tends of gigabits per second of bandwidth, enough to satisfy some of your concerns.
Laser links are wonderful when a) they are through clear air and b) you have a stable alignment between the two endpoints. They fall apart (in the sense of hilariously low channel capacity) in the presence of turbulence / fog / rain / snow / smog / dust / smoke / physical obstructions, or when the endpoints are changing alignment. For many applications this is fine. For rapidly-moving drones in a battlefield environment, where if the enemy knows that they can disrupt you at a critical moment with a smokebomb they absolutely will, not so much. (To an extent you can compensate for some of this by upping the laser power and adding more elaborate beam tracking of various sorts… but a) a small flying craft doesn’t exactly have spare energy or mass, and b) you now have enough scattering that the beam is easily detectable.) (Note that it’s not just attenuation that’s the issue. It’s also dispersion.)
(You can kind of get away with unstable but predictable and smooth alignment, e.g. tracking satellites. But the equipment for this is not exactly easily fitted on a small drone, and drone movement in a battlefield environment is not exactly smooth.)
Oh, and laser links are also point-to-point, which increases effective latency compared to a broadcast system (as there’s a limited number of transmitters and receivers on any one craft, the current leader cannot directly receive updates from everyone, even if there’s the bandwidth available. It has to be bounced/consolidated through relays, which adds latency).
So TLW, at the end of the day, all your objections are in the form of “this method isn’t perfect” or “this method will have issues that are fundamental theorems”.
And you’re right. I’m taking the perspective of, having built smaller scale versions of networked control systems, using a slightly lossy interface and an atomic state update mechanism, “we can make this work”.
I guess that’s the delta here. Everything you say as an objection is correct. It’s just not sufficient.
At the end of the day, we’re talking about a collection of vehicles. Each is somewhere between the size of a main battle tank and a human sized machine that can open doors and climb stairs. All likely use fuel cells for power. All have racks of compute boards, likely arm based SOCs, likely using TPUs or on-die coprocessors. Hosted on these boards is a software stack. It is very complex but at a simple level it does :
perception → state state representation → potential action set → H(potential action set) → max(H) → actuators.
That H, how it evaluates a potential action, takes into account (estimates of loss, isActionAllowed, gain_estimate(mission_parameters), gain_estimate(orders)).
It will not take an action if not allowed. (example, if weapons disabled it will not plan to use them). It will avoid actions with predicted loss unless the gain is high enough. (example it won’t normally jump out a window but if $HIGH_VALUE_TARGET is escaping around the corner, the machine should and will jump out a window, firing in midair before it is mission killed on impact, when the heuristic is tuned right)
So each machine is fighting on it’s own, able to kill enemy fighters on it’s own, assassinate VIPs, avoid firing on civilians, unless the reward is high enough [it will fire through civilians if the predicted gain is set high enough. These machines are of course amoral and human operators setting “accomplish at all costs” for a goal’s priority will cause many casualties].
The coordination layer is small in data, except for maybe map updates. Basically the “commanders” are nodes that run in every machine, they all share software components where the actual functional block is ‘stateless’ as mentioned. Just because there is a database with cached state and you send (delta, hash) each frame in no way invalidates this design. What stateless means is that the “commander” gets (data from last frame, new information) and will make a decision based only on the arguments. At an OS level this is just a binary running in it’s own process space that after each frame, it’s own memory is in the same state it started in. [it wrote the outputs to shared memory, having read the inputs from read only memory]
This is necessary if you want to have multiple computer redundancy, or software you can even debug. FYI I actually do this, this part’s present day.
Anyways in situations where the “commander” doesn’t work for any of the reasons you mention...this doesn’t change a whole lot. Each machine is now just fighting on it’s own or in a smaller group for a while. They still have their last orders.
If comm losses are common and you have a much larger network, the form you issue orders in—that limits the autonomy of ever smaller subunits—might be a little interesting.
I think I have updated a little bit. From thinking about this problem, I do agree that you need the software stacks to be highly robust to network link losses, breaking into smaller units, momentary rejoins not sufficient to send map updates, and so on. This would be a lot of effort and would take years of architecture iteration and testing. There are some amusing bugs you might get, such as one small subunit having seen an enemy fighter sneak by in the past, then when the units resync with each other, fail to report this because the sync algorithm flushes anything not relevant to the immediate present state and objectives.
I’m not the Mailman, but I’m up there. I tend to write out a sketch, then go back and ponder it a while, massaging it into some semblance of order, deleting/modifying arguments that in retrospect don’t work, and inflating it out into a quasi-coherent post in the process. This takes a fair bit of time. It works well in an asynchronous context. It does not work well in a synchronous context. In my experience, when I attempt to discuss in a synchronous context I end up with one of the following two things (or both!):
1. I state arguments or views that are insufficiently thought-out and that are obviously incorrect/inconsistent in retrospect, or are misleading/confusing/weaker than they should be.
2. I end up with essentially just a forum discussion that happens to be on Discord. Walls of text and all.
The 2nd would be fine, but this then runs into another issue:
Much of the reason why I am on a website like this is so that people can follow arguments / point out issues with my views / etc. Partly for the later benefit of others following my chains of logic. Partly for the later benefit of others when they can refer back to my chains of logic. Partly for the later benefit of myself, when someone down the line sees an old comment of mine and replies with something I hadn’t thought of. And partly for the later benefit of myself, when I can refer back to my chains of logic.
Discord does not achieve these.
Someone searching this site does not see a Discord conversation. If you (or whoever owns the room, rather) close the Discord room, then the information is lost. (Or if e.g. Discord decides 6m down the line to start dropping old conversation history, etc, etc.) You can, somewhat awkwardly, archive a Discord conversation. And, say, post it on this site. But that’s now just a forum conversation with extra steps (not to mention that it’s now associated with the person who posted the transcript, not the people in the transcript. If you post the transcript and someone replies to a comment of mine in it, I don’t get a notification.).
You previously stated that ‘”commanders” are stateless.’. Do commanders have state here?
If commanders are stateless, this technique does not work as they have nothing to base the deltas on.
If commanders are stateful, and are maintaining a worldstate by deterministically applying deltas… you’re right back to CAP theorem limitations. Pick at least one of a) diverging worldstates in the presence of network partitions, b) the (bad) assumption of no network partitions, and/or c) arbitrarily-long stalls in the presence of network partitions.
(AoE falls under c) here, if you’re wondering. In networked gaming, sacrificing consistency in the presence of a network partition is called a desync and is a Bad Thing(TM).)
Sure. This is just eventual consistency, restated. With all of the wrinkles that eventually-consistent distributed databases have.
To go back to your game example for a moment. You’ve got a 2-on-1 match—AB versus X.
A, B and X each have an army.
X’s army wins ties. So in a fight between X and A or X and B, X wins. But in a fight between X and AB, X loses.
X’s army is faster. If both side’s bases are destroyed, X wins, as X destroys AB’s bases first.
X has three options: defend, or attack through one of two chokepoints—path P, that A has info on, and path Q, that B has info on.
Ditto, A and B each have three options: defend P, defend Q, or attack.
X went eco-heavy, and so AB will lose if they neither manage to destroy X’s base nor destroy X’s army.
This is essentially a coordination game of sorts, with an additional wrinkle that neither A nor B has the full picture of what’s going on.
In the presence of reliable prompt communication between A and B, this is a reliable win for AB. AB relay information on P and Q to each other, and both check paths P and Q. If P or Q has X’s army, A and B send both armies there and destroys it. Otherwise, X’s army must be defending and A and B send both armies to X’s base, destroying it.
Now let’s say that instead the network connection is lost between A and B. A checks and sees that P does not have X’s army, but has no info on Q. This means that A must either send its army to Q, or X. But which one? It could be either. Say it sends it to X.
B checks and sees that Q does have X’s army, but has no info on P (not that it matters in this case). This means that B must send its army to Q, so it does.
Some time later, the network comes back online. A and B consolidate their world-state just fine. Annddd… A’s army is attacking instead of defending Q, and B’s army and then their bases are dying in the meantime, and they lose. X has a 50% chance of winning in this simply by making a random choice between X and P, if communication is disrupted at said critical point. (Or X and Q. Either works.)
Eventual consistency is not enough.
Laser links are wonderful when a) they are through clear air and b) you have a stable alignment between the two endpoints. They fall apart (in the sense of hilariously low channel capacity) in the presence of turbulence / fog / rain / snow / smog / dust / smoke / physical obstructions, or when the endpoints are changing alignment. For many applications this is fine. For rapidly-moving drones in a battlefield environment, where if the enemy knows that they can disrupt you at a critical moment with a smokebomb they absolutely will, not so much. (To an extent you can compensate for some of this by upping the laser power and adding more elaborate beam tracking of various sorts… but a) a small flying craft doesn’t exactly have spare energy or mass, and b) you now have enough scattering that the beam is easily detectable.) (Note that it’s not just attenuation that’s the issue. It’s also dispersion.)
(You can kind of get away with unstable but predictable and smooth alignment, e.g. tracking satellites. But the equipment for this is not exactly easily fitted on a small drone, and drone movement in a battlefield environment is not exactly smooth.)
Oh, and laser links are also point-to-point, which increases effective latency compared to a broadcast system (as there’s a limited number of transmitters and receivers on any one craft, the current leader cannot directly receive updates from everyone, even if there’s the bandwidth available. It has to be bounced/consolidated through relays, which adds latency).
So TLW, at the end of the day, all your objections are in the form of “this method isn’t perfect” or “this method will have issues that are fundamental theorems”.
And you’re right. I’m taking the perspective of, having built smaller scale versions of networked control systems, using a slightly lossy interface and an atomic state update mechanism, “we can make this work”.
I guess that’s the delta here. Everything you say as an objection is correct. It’s just not sufficient.
At the end of the day, we’re talking about a collection of vehicles. Each is somewhere between the size of a main battle tank and a human sized machine that can open doors and climb stairs. All likely use fuel cells for power. All have racks of compute boards, likely arm based SOCs, likely using TPUs or on-die coprocessors. Hosted on these boards is a software stack. It is very complex but at a simple level it does :
perception → state state representation → potential action set → H(potential action set) → max(H) → actuators.
That H, how it evaluates a potential action, takes into account (estimates of loss, isActionAllowed, gain_estimate(mission_parameters), gain_estimate(orders)).
It will not take an action if not allowed. (example, if weapons disabled it will not plan to use them). It will avoid actions with predicted loss unless the gain is high enough. (example it won’t normally jump out a window but if $HIGH_VALUE_TARGET is escaping around the corner, the machine should and will jump out a window, firing in midair before it is mission killed on impact, when the heuristic is tuned right)
So each machine is fighting on it’s own, able to kill enemy fighters on it’s own, assassinate VIPs, avoid firing on civilians, unless the reward is high enough [it will fire through civilians if the predicted gain is set high enough. These machines are of course amoral and human operators setting “accomplish at all costs” for a goal’s priority will cause many casualties].
The coordination layer is small in data, except for maybe map updates. Basically the “commanders” are nodes that run in every machine, they all share software components where the actual functional block is ‘stateless’ as mentioned. Just because there is a database with cached state and you send (delta, hash) each frame in no way invalidates this design. What stateless means is that the “commander” gets (data from last frame, new information) and will make a decision based only on the arguments. At an OS level this is just a binary running in it’s own process space that after each frame, it’s own memory is in the same state it started in. [it wrote the outputs to shared memory, having read the inputs from read only memory]
This is necessary if you want to have multiple computer redundancy, or software you can even debug. FYI I actually do this, this part’s present day.
Anyways in situations where the “commander” doesn’t work for any of the reasons you mention...this doesn’t change a whole lot. Each machine is now just fighting on it’s own or in a smaller group for a while. They still have their last orders.
If comm losses are common and you have a much larger network, the form you issue orders in—that limits the autonomy of ever smaller subunits—might be a little interesting.
I think I have updated a little bit. From thinking about this problem, I do agree that you need the software stacks to be highly robust to network link losses, breaking into smaller units, momentary rejoins not sufficient to send map updates, and so on. This would be a lot of effort and would take years of architecture iteration and testing. There are some amusing bugs you might get, such as one small subunit having seen an enemy fighter sneak by in the past, then when the units resync with each other, fail to report this because the sync algorithm flushes anything not relevant to the immediate present state and objectives.