Another thing that you can do when folding associative operations over a sequence is re-parenthesize them in a way that makes them more efficient to compute.
For example, if A is a 1x1000 vector, B is a 1000x1000 square matrix, and C is a 1000x1 vector, (AB)(BC) takes less time and space to compute than A(BB)C since the intermediate 1000x1000 BB needs to be stored explicitly in the latter.
In the following python snippet, the computation of A(BB)C takes around 3/4ths of a second, while (AB)(BC) takes a handful of milliseconds.
from time import time
from numpy.random import random
A = random((1, 1000))
B = random((1000, 1000))
C = random((1000, 1))
a0 = time(); print(A.dot(B.dot(B)).dot(C)); a1 = time(); print(a1 - a0)
b0 = time(); print(A.dot(B).dot(B.dot(C))); b1 = time(); print(b1 - b0)
(Ironically, the values can be slightly different since floating point multiplication and addition aren’t associative, unlike the corresponding operations over reals.)
Along the same lines, folds of associative operations over a sequence can be parallelized by parenthesizing the sequence as a balanced binary tree and evaluating the instances of the operation on separate nodes.
Another thing that you can do when folding associative operations over a sequence is re-parenthesize them in a way that makes them more efficient to compute.
For example, if A is a 1x1000 vector, B is a 1000x1000 square matrix, and C is a 1000x1 vector, (AB)(BC) takes less time and space to compute than A(BB)C since the intermediate 1000x1000 BB needs to be stored explicitly in the latter.
In the following python snippet, the computation of A(BB)C takes around 3/4ths of a second, while (AB)(BC) takes a handful of milliseconds.
(Ironically, the values can be slightly different since floating point multiplication and addition aren’t associative, unlike the corresponding operations over reals.)
Along the same lines, folds of associative operations over a sequence can be parallelized by parenthesizing the sequence as a balanced binary tree and evaluating the instances of the operation on separate nodes.