The equation king-man+woman=queen suggests that vector addition is not necessarily a fundamental operation in the space of embedded words, but instead we want a weaker operation that is kind of like addition but where we cannot pinpoint a zero vector.
In mathematics, we sometimes want to define a new axiomatization for an algebraic structure where we do not have any presence of a zero element. The best way to do this is to replace our binary fundamental operation with a ternary operation. For example, the notion of a median algebra is an adaptation of the notion of a distributive lattice where we do not have any notion of which way is up and which way is down but if we have two points and we specify which one we want to be on top and which one we want to be on the bottom, then the interval between those two points will be a distributive lattice. As another example, the notion of a heap is what you would get if you took the notion of a group but you removed the identity element.
A heap is an algebraic structure (X,t) where X is a set and t is a ternary operation that satisfies the identities
t(t(v,w,x),y,z)=t(v,w,t(x,y,z))=t(v,t(y,x,w),z) and t(x,x,y)=t(y,x,x)=y.
If (G,⋅) is a group, then we can define a ternary heap operation t:G3→G by setting t(x,y,z)=xy−1z. For example, t(king,man,woman)=queen.
Similarly, if t:G3→G is a heap operation on a set G and a∈G (a can be called a basepoint), then we can define a group operation ⋅a by setting x⋅ay=t(x,a,y). The category of groups is isomorphic to the category of heaps with a basepoint a.
We can adapt other classes of mathematical structures to the case where we remove a basepoint including the classes of rings, vector spaces, and even inner product spaces if we really wanted to, and an affine space is the thing we obtain when we take a vector space and remove the origin but keep everything else intact in such a way that we can recover everything from the affine space and the origin. In an affine space V over a field K (the field K has an origin, but the space V does not), we cannot define scalar multiplication since that requires a basepoint, but we can declare a point in V to be the origin for the purposes of performing scalar multiplication. We therefore define a scalar multiplication operation s:K×V×V→V by setting S(α,x,y)=α⋅(y−x)+x=αy+(1−α)x which is scalar multiplication where x is declared to be the origin. From this scalar multiplication, we can easily define the mean of several elements and weighed means (except in some finite fields). So if we are given data x1,…,xn in an affine space V, we can still average x1,…,xn and then declare the average to be the origin of V and then we have obtained a vector space instead of just an affine space.
Suppose that K,X are finite dimensional vector spaces over a field F, and let ι:K→X be a vector space isomorphism. Let t denote the heap operations on these vector spaces. Let f:X→X be a permutation, and let E:K×X→X be the function defined by E(k,x)=ι(k)+f(x). If k∈K, then define Ek:X→X by setting Ek(x)=E(k,x). Observe that E−1k(y)=f−1(y−ι(k)).
We can define (in a model theoretic sense) the heap operations on K and X from E, but we are not necessarily able to define the group operations on K and X from E.
If j,k,l∈K, then Ej(E−1k(x))=ι(j)+f(E−1k(x))=ι(j)+f(f−1(x−ι(k)))=ι(j)+x−ι(k)=ι(j−k)+x.
Given i,j,k∈K, there is a unique l∈K where Ei∘E−1j=El∘E−1k since Ei∘E−1j=El∘E−1k precisely when i−j=l−k which happens precisely when i−j+l=k. We can therefore define the heap operation t on K from E.
Given x,y,z∈X, there are j,k∈K where Ej(E−1k(y))=ι(j−k)+y=x. We can therefore define t:X3→X from E by setting t(x,y,z)=Ej(E−1k(z))=ι(j−k)+z=x−y+z. We can define the heap operations on K,X from E, but we cannot necessarily define the zero element from E.
For example, K,X could be vector spaces over the finite field with 2 elements and E could be the round function in a block cipher (we observe that permutation-substitution encryption functions like the AES encryption are simply recurrent neural networks over finite fields; AES uses x↦x−1,0↦0 as its activation function). More generally K,X could be vector spaces, and f:X→X could be an RNN layer.
Let us not forget that anyone who has studied a little bit of physics knows that there is a lot of physical intuition for affine spaces without origins. We cannot pinpoint a precise location and declare that location to be the center of the universe. We cannot say that an object is at rest; we can only say that an object is at rest relative to another object.
Great answer! I didn’t follow the fine details of your response, but “there is no meaningful zero point” was the key point I wanted to get across. This is the first correct answer I’ve seen!
The equation king-man+woman=queen suggests that vector addition is not necessarily a fundamental operation in the space of embedded words, but instead we want a weaker operation that is kind of like addition but where we cannot pinpoint a zero vector.
In mathematics, we sometimes want to define a new axiomatization for an algebraic structure where we do not have any presence of a zero element. The best way to do this is to replace our binary fundamental operation with a ternary operation. For example, the notion of a median algebra is an adaptation of the notion of a distributive lattice where we do not have any notion of which way is up and which way is down but if we have two points and we specify which one we want to be on top and which one we want to be on the bottom, then the interval between those two points will be a distributive lattice. As another example, the notion of a heap is what you would get if you took the notion of a group but you removed the identity element.
A heap is an algebraic structure (X,t) where X is a set and t is a ternary operation that satisfies the identities
t(t(v,w,x),y,z)=t(v,w,t(x,y,z))=t(v,t(y,x,w),z) and t(x,x,y)=t(y,x,x)=y.
If (G,⋅) is a group, then we can define a ternary heap operation t:G3→G by setting t(x,y,z)=xy−1z. For example, t(king,man,woman)=queen.
Similarly, if t:G3→G is a heap operation on a set G and a∈G (a can be called a basepoint), then we can define a group operation ⋅a by setting x⋅ay=t(x,a,y). The category of groups is isomorphic to the category of heaps with a basepoint a.
We can adapt other classes of mathematical structures to the case where we remove a basepoint including the classes of rings, vector spaces, and even inner product spaces if we really wanted to, and an affine space is the thing we obtain when we take a vector space and remove the origin but keep everything else intact in such a way that we can recover everything from the affine space and the origin. In an affine space V over a field K (the field K has an origin, but the space V does not), we cannot define scalar multiplication since that requires a basepoint, but we can declare a point in V to be the origin for the purposes of performing scalar multiplication. We therefore define a scalar multiplication operation s:K×V×V→V by setting S(α,x,y)=α⋅(y−x)+x=αy+(1−α)x which is scalar multiplication where x is declared to be the origin. From this scalar multiplication, we can easily define the mean of several elements and weighed means (except in some finite fields). So if we are given data x1,…,xn in an affine space V, we can still average x1,…,xn and then declare the average to be the origin of V and then we have obtained a vector space instead of just an affine space.
Suppose that K,X are finite dimensional vector spaces over a field F, and let ι:K→X be a vector space isomorphism. Let t denote the heap operations on these vector spaces. Let f:X→X be a permutation, and let E:K×X→X be the function defined by E(k,x)=ι(k)+f(x). If k∈K, then define Ek:X→X by setting Ek(x)=E(k,x). Observe that E−1k(y)=f−1(y−ι(k)).
We can define (in a model theoretic sense) the heap operations on K and X from E, but we are not necessarily able to define the group operations on K and X from E.
If j,k,l∈K, then Ej(E−1k(x))=ι(j)+f(E−1k(x))=ι(j)+f(f−1(x−ι(k)))=ι(j)+x−ι(k)=ι(j−k)+x.
Given i,j,k∈K, there is a unique l∈K where Ei∘E−1j=El∘E−1k since Ei∘E−1j=El∘E−1k precisely when i−j=l−k which happens precisely when i−j+l=k. We can therefore define the heap operation t on K from E.
Given x,y,z∈X, there are j,k∈K where Ej(E−1k(y))=ι(j−k)+y=x. We can therefore define t:X3→X from E by setting t(x,y,z)=Ej(E−1k(z))=ι(j−k)+z=x−y+z. We can define the heap operations on K,X from E, but we cannot necessarily define the zero element from E.
For example, K,X could be vector spaces over the finite field with 2 elements and E could be the round function in a block cipher (we observe that permutation-substitution encryption functions like the AES encryption are simply recurrent neural networks over finite fields; AES uses x↦x−1,0↦0 as its activation function). More generally K,X could be vector spaces, and f:X→X could be an RNN layer.
Let us not forget that anyone who has studied a little bit of physics knows that there is a lot of physical intuition for affine spaces without origins. We cannot pinpoint a precise location and declare that location to be the center of the universe. We cannot say that an object is at rest; we can only say that an object is at rest relative to another object.
Great answer! I didn’t follow the fine details of your response, but “there is no meaningful zero point” was the key point I wanted to get across. This is the first correct answer I’ve seen!
(I imagine you know this, but for the sake of future readers) I think the word Torsor is relevant here :) https://math.ucr.edu/home/baez/torsors.html is a nice informal introduction.