This will be an “ML-oriented” introduction to representation theory of finite groups. It is an introductory sketch that orients towards both a language and a result (the “real semisimplicity” result) that is useful for thinking about the subject in an interpretability context.

This is a somewhat low-effort math post (subbing in for a planned post I’m running late on). It isn’t a prerequisite for anything, but will be useful background for a future discussion of how NN’s learn associative group multiplication (a joint project with Louis Jaburi). I’m hoping for this to be readable by people with a very small amount of background in abstract algebra: it is meant to fill in some very basic knowledge gaps. I’d be very interested in comments about what is confusing or difficult about this post, to help focus on the right exposition directions when writing about this in the future.

I’m not going to assume much abstract algebra, other than the notions of a group and an algebra (always assumed to be a unital, associative algebra).

Algebras of note

Recall that all the set ${Mat}_{n \times n}$ of $n \times n$ (real-valued) matrices forms an algebra (it’s a vector space with an associative product that distributes across addition). For any field F, there is a notion of a matrix algebra over F and ${Mat}_{n \times n}$ is naturally an algebra over $R$ . The set ${Mat}_{n \times n} (C)$ of complex-valued matrices forms an algebra over $C$ , but it can also be viewed as an algebra over $R$ , in the same way as $C$ can be viewed as a two-dimensional vector space.

We will be interested in understanding the “algebra” associated to a finite group G. It turns out that working with algebras over $C$ is easier than working with algebras over $R$ from a theoretical viewpoint (this is of course because $C$ is algebraically closed, i.e., any polynomial equation has a solution. We’ll start out looking at such representations. However unfortunately, neural nets don’t care about theoretical ease: instead they care about compression and efficiency, and this will necessitate also understanding a bit of representation theory over $R$ .

For starters, we work over $C$ . We’ll need one more useful algebra that will be one of the main characters in our discussion: namely, the group algebra of G, written $C [G]$ . This is the algebra whose elements are formal complex linear combinations of elements of $G$ , and where product is induced from the product on $G$ distributively. We analogously have the real-valued group algebra $R [G]$ . When we don’t want to commit to either real or complex coefficients and want to treat them simultaneously, we’ll write $F [G]$ for F one of our fields $R$ or $C$ (a bit of the analysis here carries over to more general fields, though usually one should require the characteristic of the field to be relatively prime to the order $| G |$ of the group for the theory to have comparable niceness).

So for example if $G = C_{n}$ is the cyclic group on n elements ( $C_{n}$ is better notation for the group that is sometimes written $Z_{n}$ or $Z / n$ ), then an element of $C [G]$ is a sum $λ_{0} 0 + λ_{1} 1 + λ_{2} 2 + \dots λ_{n - 1} n - 1 .$ Here the “bolded” numbers are group elements. Note of course that since we’re viewing it as a group, multiplication in $C_{n}$ corresponds to addition of numbers: i.e., $a \cdot b = a + b mod n .$ (This is another reason why $C_{n}$ is better notation—since $Z / n$ could also mean the ring, multiplicative and additive notions can be more confusing.) To see how group multiplication in this algebra works, let’s work out an example with $G = C_{3}$ : $(4 \cdot 1 + 5 \cdot 2) \cdot (- 3 \cdot 0 + 2) =$ $- 121 \cdot 0 + 41 \cdot 2 - 15 \cdot 2 \cdot 0 + 10 \cdot 2 \cdot 2 =$ $4 \cdot 0 + (- 12 + 10) \cdot 1 - 15 \cdot 2 .$ (Here remember that we’re writing the group multiplicatively, so for example $2 \cdot 2 = 1$ since $2 + 2 \equiv 1 mod 3$ .)

Representations

Now for the purposes of this note we define an n-dimensional representation of $G$ to be a map of algebras from the group algebra to a matrix algebra: $ρ : F [G] \to {Mat}_{n \times n} (F) .$ Here F can either be the field $R$ (“real-valued representations”) or the field $C$ (“complex-valued representations”).

Of course by linearity, it’s enough to specify $ρ (g)$ for elements $g \in F [G]$ , which are the standard basis for the group algebra: i.e., $ρ$ can be recovered from its restriction $ρ ∣_{G} : G \to {Mat}_{n \times n}$ . Conversely, any map $ρ_{0} : G \to {Mat}_{n \times n}$ leads to a map of algebras if it’s

Unital, i.e. $ρ_{0} (1) = 1$ and
A multiplicative homomorphism, i.e., $ρ (g h) = ρ (g) ρ (h)$ (product of matrices on the RHS).

Note that setting $h = g^{- 1}$ and using the homomrphism property tells us that $ρ (g)$ must always be invertible, i.e., an element of the group of invertible matrices $G L_{n} (F) \subset {Mat}_{n \times n} (F)$ ; this shows that our notion of representation is equivalent to the usual notion of group homomorphism $ρ_{0} : G \to G L_{n} (F)$ . We say that two representations $ρ_{1}, ρ_{2}$ are equivalent if

they have the same dimension (i.e., both n-dimensional for the same n) and
they are conjugate: so there exists some other invertible matrix $γ \in G L_{n} (F)$ such that, for all $g \in G$ , we have the identity $ρ_{2} (g) = γ ρ_{1} (g) γ^{- 1}$ (if you haven’t seen expressions like this, convince yourself that the formula on the right defines a new representation of $G$ for any fixed invertible $γ$ .) In this case we think of $ρ_{1}$ and $ρ_{2}$ to act “via the same symmetries”, just expressed in different bases (the standard basis in the case of $ρ_{1}$ and the basis given by $γ$ in the case of $ρ_{2}$ ).

If you’ve ever heard of group representations, you’ve heard of the notion of irreducibility. A representation $ρ$ is irreducible if it cannot be decomposed into smaller components: i.e., if it isn’t conjugate to a block-diagonal representation of the form

$g \mapsto (\begin{matrix} ρ_{1} (g) & 0 0 & ρ_{2} (g) \end{matrix}),$ for $ρ_{1}, ρ_{2}$ nonzero representations.

The “reducible” representation above (check this is a representation if you don’t believe me!) is denoted $ρ_{1} \oplus ρ_{2}$ .

Now the big secret point of representation theory is that the set of irreducible representations (up to isomorphism) “contain all the algebraic information” about G. This is encompassed by a series of results; an appealing consequence of these results (and a way of condensing much of the information contained in them) is the semisimplicity theorem below.

Let $ρ_{1} : C [G] \to {Mat}_{n_{1} \times n_{1}} (C), \dots ρ_{k} : G \to {Mat}_{n_{k} \times n_{k}} (G)$ be a collection of representations that contains each irreducible representation exactly once up to conjugacy. (Usually, since conjugacy, i.e., “isomorphism”, simply corresponds to choosing a different basis, people will use “sum of all irreducible representations” to mean this object). Here $n_{1}, \dots, n_{k}$ are the dimensions of the irreducible representations. The classification of (complex) representations of G and the eigenvalues of the matrices $ρ_{i} (g)$ is called character theory (something that gets covered in a first course on representation theory, but that we won’t get into here). Character theory in particular implies that the number (k) of distinct irreducible complex representations is equal to the number of conjugacy classes of G, i.e., the number of elements on the set obtained from G by identifying $g$ and $γ g γ^{- 1}$ for any G. If G is commutative, then each conjugacy class just has a single element (each element g above only gets identified with itself), so in the special case of commutative groups (also known as “abelian” groups), the number of conjugacy classes is equal to the number of elements, i.e., $k = | G |$ .

Theorem (semisimplicity of the group algebra):

$ρ_{1} \oplus ρ_{2} \oplus \dots \oplus ρ_{k}$ be the direct sum of all irreducible representations of G. Then the map

$C [G] \to {Mat}_{n_{1} + n_{2} + \dots + n_{k}} (C)$

is injective, and identifies $C [G]$ with the set of all block diagonal matrices

$⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ \begin{matrix} M_{1} & 0 & \dots & 0 0 & M_{2} & \dots & 0 ⋮ & ⋮ & ⋱ & ⋮ 0 & 0 & \dots & M_{k} \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ .$

(Here the $M_{i}$ are $n_{i} \times n_{i}$ complex-valued matrices). In other words if we only care about the algebra $C [G]$ , then all of its structure is fully determined by the (unordered) collection of irreducible representation dimensions $n_{1}, \dots, n_{k}$ . Since there are a lot of groups, this implies that there are some pairs of distinct groups G, H that have isomorphic matrix algebras! I.e., while the groups cannot be identified with each other, every element of G can be identified with a linear combination of elements of H in a way that respects multiplication and admits an inverse (once again linear combination-valued): indeed, if the representation dimensions of G and H coincide, then as algebras, we can identify both group algebras with the same algebra of block-diagonal matrices. A simple example of a pair of distinct groups with isomorphic group algebras are the groups $C_{9}$ and $C_{3} \times C_{3}$ ^[1].

Real reprsentation theory, and “representation type”.

But once again, if we want to understand how NN’s process groups, we need to understand the “more efficient” representation theory over $R$ . Here we can start our analysis by looking at the same picture as before: namely, while we have no guarantee that the matrix coefficients of a matrix $ρ (g)$ in a complex representation has real coefficients, we do have a real subring $R [G] \subset C [G]$ , namely real-valued formal linear combinations of group elements viewed as a subring of complex-valued ones. Since each complex linear combination can be uniquely identified with a unique block-diagonal matrix ${Block}_{C} := ⎛ ⎜ ⎜ ⎝ \begin{matrix} M_{1} & \dots & 0 ⋮ & ⋱ & ⋮ 0 & \dots & M_{k} \end{matrix} ⎞ ⎟ ⎟ ⎠,$ we want to understand what kinds of subrings we can obtain. This must be a “decomplexification” (more standardly called “real form”) of ${Block}_{C}$ , a subring ${Block}_{R} \subset {Block}_{C} .$ which has the following properties:

${Block}_{R}$ is closed under multiplication by real scalars (but not necessarily complex scalars),
${Block}_{R}$ has half the dimension of the real vector space ${Block}_{C}$ .
each block-diagonal matrix $M \in {Block}_{C}$ can be expressed as $M_{0} + i M_{1},$ for $M_{0}, M_{1}$ in ${Block}_{R}$ .

Of course, one “real form” is the set of block-diagonal matrices as above with real coefficients, and sometimes this does correctly capture the image of the real group algebra (this is the case, for example, for the symmetric group $S_{n}$ ). However, there exist other possibilities for the real form. The property of being “real” (i.e., having real coefficients as a formal linear combination of group elements) as a group algebra vector may not be compatible (even after allowing a fixed conjugation) with the property of being “real” (i.e., having real coefficients as a matrix) as a block-diagonal matrix. In other words, the group G (and its real group algebra) “chooses” a specific real form of the algebra of block diagonal matrices.

Such real forms can be fully classified, and there are exactly three things that can happen to some $n \times n$ block ${Mat}_{n \times n} (C) .$

It gets replaced by (a conjugate of) ${Mat}_{n \times n} (R) \subset {Mat}_{n \times n} (C) .$ Irreducible representations $ρ_{i}$ for which this happens for the corresponding block are said to be of “real type” (note that being of real type is a property of a complex-valued representation!).
It gets combined with another $n \times n$ block and (again up to conjugating by some invertible matrix $γ$ ), a block-diagonal matrix spanning these two blocks $(\begin{matrix} A & 0 0 & B \end{matrix}) \in {Block}_{R}$ is “real” if and only if A and B are (coordinatewise) complex conjugates: i.e., coordinatewise, $A = ¯ B$ . Pairs of representations $ρ_{i}, ρ_{j}$ for which this happens are said to be of “complex type”, and they always satisfy $ρ_{i} (g) = {¯ ρ}_{j} (g)$ (after possibly replacing one of $ρ_{i}, ρ_{j}$ by a conjugate).
The dimension n of the corresponding block is even and, up to conjugation, the matrices $M_{i} \in {Mat}_{n \times n} (R)$ in the real form associated to this block can be split into $2 \times 2$ blocks:
$M_{i} = ⎛ ⎜ ⎜ ⎜ ⎝ \begin{matrix} h_{0, 0} & \dots & h_{0, n / 2} ⋮ & ⋱ & ⋮ h_{n / 2, 0} & \dots & h_{n / 2, n / 2} \end{matrix} ⎞ ⎟ ⎟ ⎟ ⎠,$
where each $h_{i j} \in {Mat}_{2 \times 2} (C)$ above is a complex-valued $2 \times 2$ quaternion matrix, i.e., a matrix of the form
$h_{i j}$ = a1 + bi + ci + dj, where $a, \dots, d$ are real numbers and 1 = $(\begin{matrix} 1 & 0 0 & 1 \end{matrix})$ is the usual unit matrix, and we have
i = $(\begin{matrix} i & 0 0 & i \end{matrix})$ ,
j = $(\begin{matrix} 0 & 1 - 1 & 0 \end{matrix})$ and
k = $(\begin{matrix} 0 & i - i & 0 \end{matrix})$
(These matrices are known in physics as the Pauli matrices.)
The resulting 4-dimensional real subalgebra of ${Mat}_{2 \times 2}$ is called the quaternion algebra, denoted $H$ . It has the wonderful property of being the only (finite-dimensional) division algebra over $R$ (i.e. algebra all of whose nonzero elements are invertible) except for $C$ and $R$ itself, and the only non-commutative one. Representations $ρ_{i}$ whose associated block exhibits this behavior are said to be of “quaternionic type”.

Together, these results imply that whereas the complex group algebra $C [G]$ is always (isomorphic to) a direct sum of matrix algebras ${Mat}_{n_{1} \times n_{1}} \oplus \dots \oplus {Mat}_{n_{k} \times n_{k}}$ (this expression, with the symbol $\oplus$ denoting “direct sum”, is the standard mathematical notation for the block-diagonal algebra), the real group algebra $R [G]$ is isomorphic to a direct sum of matrix algebras of three forms:

Real matrix algebras ${Mat}_{n \times n} (R)$ (each is associated to a single complex representation of dimension n).
Complex matrix algebras ${Mat}_{n \times n} (C)$ (but viewed as an algebra over $R$ ). These components of the real-valued group algebra “combine” two complex representations, so there are half as many such components as there are irreducible (complex-valued) representations of complex type.
Quaternionic matrix algebras ${Mat}_{n \times n} (H)$ (each is associated to a single complex representation of dimension 2n).

Examples

We conclude by giving some explicit examples of the comparison between the group algebra the block-diagonal matrix algebra associated to irreducible representations.

Complex representations of cyclic groups

Let $G = C_{n}$ be the cyclic group on n elements. In this case it turns out each irreducible representation of G is one-dimensional and there are n of these. By our semisimplicity result above, this is equivalent to the fact that the group algebra $C [G]$ is isomorphic to the algebra of diagonal $n \times n$ complex matrices. To see this it is enough to construct an isomorphism.

Let $ζ := exp (2 π i / n)$ be the elementary nth root of unity. For any residue \[a\in C_n\], we write $M (a) = diag (1, ζ^{a}, ζ^{2 a}, \dots, ζ^{(n - 1) a}) .$

Here in order to make sense of $ζ^{a}$ where a is a residue (rather than an integer), we’re implicitly using the fact that $ζ^{n} = 1$ (and so exponents of $ζ$ only depend on their residue mod n). The mapping above is obviously a homomorphism (i.e., compatible with multiplicaiton). It’s a standard fact that the above diagonal matrices are linearly independent as a ranges over the group elements. Since both the algebra of $n \times n$ diagonal matrices is n-dimensional, these n linearly independent vectors must form a basis; since we have identified a basis of the group algebra $C [G]$ with a basis of the block-diagonal matrix algebra, this is an isomorphism. If we look at the diagonal matrix associated to the linear combination $λ_{0} 0 + \dots + λ_{n - 1} n - 1,$ its kth element is $λ_{0} + λ_{1} ζ + λ_{2} ζ^{2} + \dots + λ_{n - 1} ζ^{n - 1} .$ Looking at this formula, we see that we have rediscovered the finite Fourier transform: indeed, if we view the coefficients $n \mapsto λ_{n}$ as a complex-valued function $λ$ on \the group elements $0, \dots, n - 1$ , then the (k,k)th matrix coefficient of the associated diagonal matrix is the Fourier transform $^λ (k)$ . The homomorphism property in this case is equivalent to observing that the convolution product (which is equivalent to the group algebra product—checking this is a nice exercise) maps to the elementwise product after Fourier transform (and of course multiplying diagonal matrices multiplies their coefficients elementwise).

Real representations of cyclic groups

Here there is a difference between even and odd n. The results of the previous section tell us that there are n different complex one-dimensional representations $ρ_{i}$ (with the kth representation corresponding to taking an element of the group algebra to the kth diagonal element). A one-dimensional representation just takes each element of $a \in C_{n}$ to an (invertible) element $ρ_{k} (a) \in C,$ and is given by the formula $ρ_{k} (a) = ζ^{k a} .$ For all n, the representation $ρ_{0}$ is of real type: indeed, it takes every element of G to 1, which is a real-valued $1 \times 1$ matrix. If n is even, then the representation $ρ_{n / 2}$ is also real-valued, as it takes each element a to $(- 1)^{a} \in \pm 1$ . And whether n is even or odd, all other $ρ_{k}$ are of complex type: we can check this simply by seeing that $ρ_{k}$ is not isomorphic to the complex conjugate representation ${¯ ρ}_{k} = ρ_{- k}$ . (Note since two matrix representations are isomorphic if and only if they’re conjugate by some fixed matrix $γ$ , any 1-dimensional representation will be conjugate only to itself. Thus it’s enough to observe that for $k \neq 0, n / 2$ , ${¯ ρ}_{k} (1) =^{k} = ζ^{- k} \neq ρ_{k} (1)$ to see that $ρ_{k}$ is of complex type. Thus semisimplicity in the real case tells us that the group algebra $R [C_{n}]$ is isomorphic

$R \oplus C^{\frac{n - 1}{2}},$ if n is odd and
$R^{2} \oplus C^{\frac{n - 2}{2}},$ if n is even.

(Here an “exponent” d is the direct sum expression is shorthand for taking the direct sum of d copies of the same component).

The group of elementary quaternions

Unspurprisingly (since the quaternions are a noncommutative division ring), no commutative group has representations of quaternionic type. The smallest group with a representation of quaternionic type is the group $G = Q_{8}$ of elementary quaternions, with the 8 elements 1, −1, i, -i, j, -j, k, -k. We can identify each of these elements with the corresponding $2 \times 2$ “Pauli” matrix above, and then they form a group with identity 1 and the following multiplication table:

(-1) $\pm α = \mp α$ for $α$ one of the symbols 1, i, j, k;
i^2 = j^2 = k^2 = −1;
ij = k, ji = -k, jk = i, kj = -i, ki = j, ik = -j.

This group has one 2-dimensional complex representation of quaternionic type, thus giving a map $Q_{8} \to {Mat}_{1 \times 1} H = H$ (remember that the quaternionic dimension is half the complex dimension). This is the obvious map, taking each element of the group to the corresponding quaternion; it also has 4 different 1-dimensional real representations: one is (as for any group) the trivial representation, taking every $g \in G$ to 1, and the other three factoring through one of three surjective homomorphisms $Q_{8} \to C_{2},$ which ignore sign and take 2 of the three letters i, j, k to −1. There are no other representations, so we see that the real group algebra $R [Q_{8}]$ is isomorphic to the direct sum $H \oplus R^{4} .$ In particular, $Q_{8}$ has no representations of complex type.

^
In fact, these two algebras have the same group algebras over $R$ , which is a stronger result. Similarly, the Klein 4-group $C_{2} \times C_{2}$ has the same complex group algebra as the cyclic group $C_{4}$ , but these have nonisomorphic group algebras over $R$ .

The absolute basics of representation theory of finite groups