The theorem: where k is relatively prime to an odd prime p and n<e, k⋅pn is a square mod pe iff k is a square mod p and n is even.
The real meat of the theorem is the n=0 case (i.e. a square mod p that isn’t a multiple of p is also a square mod pe. Deriving the general case from there should be fairly straightforward, so let’s focus on this special case.
Why is it true? This question has a surprising answer: Newton’s method for finding roots of functions. Specifically, we want to find a root of f(x):=x2−k, except in Z/peZ instead of R.
To adapt Newton’s method to work in this situation, we’ll need the p-adic absolute value on Z: |k⋅pn|p:=p−n for k relatively prime to p. This has lots of properties that you should expect of an “absolute value”: it’s positive (|x|p≥0 with = only when x=0), multiplicative (|xy|p=|x|p|y|p), symmetric (|−x|p=|x|p), and satisfies a triangle inequality (|x+y|p≤|x|p+|y|p; in fact, we get more in this case: |x+y|p≤max(|x|p,|y|p)). Because of positivity, symmetry, and the triangle inequality, the p-adic absolute value induces a metric (in fact, ultrametric, because of the strong version of the triangle inequality) d(x,y):=|x−y|p. To visualize this distance function, draw p giant circles, and sort integers into circles based on their value mod p. Then draw p smaller circles inside each of those giant circles, and sort the integers in the big circle into the smaller circles based on their value mod p2. Then draw p even smaller circles inside each of those, and sort based on value mod p3, and so on. The distance between two numbers corresponds to the size of the smallest circle encompassing both of them. Note that, in this metric, 1,p,p2,p3,... converges to 0.
Now on to Newton’s method: if k is a square mod p, let a be one of its square roots mod p. |f(a)|p≤p−1; that is, a is somewhat close to being a root of f with respect to the p-adic absolute value. f′(x)=2x, so |f'(a)|p=|2a|p=|2|p⋅|a|p=1⋅1=1; that is, f is steep near a. This is good, because starting close to a root and the slope of the function being steep enough are things that helps Newton’s method converge; in general, it might bounce around chaotically instead. Specifically, It turns out that, in this case, |f(a))|p<|f'(a)|p is exactly the right sense of being close enough to a root with steep enough slope for Newton’s method to work.
Now, Newton’s method says that, from a, you should go to a1:=a−f(a)f'(a)=a−a2−k2a. 2a is invertible mod pe, so we can do this. Now here’s the kicker: f(a1)=(a−a2−k2a)2−k=a2−2a(a2−k2a)+(a2−k2a)2−k=(a2−k)2(2a)2, so |f(a1)|p=|a2−k|2p|2a|2p=|a2−k|2p<|a2−k|p. That is, a1 is closer to being a root of f than a is. Now we can just iterate this process until we reach ai with |f(ai)|p≤p−e, and we’ve found our square root of k mod pe.
Exercise: Do the same thing with cube roots. Then with roots of arbitrary polynomials.
The part about derivatives might have seemed a little odd. After all, you might think, Z is a discrete set, so what does it mean to take derivatives of functions on it. One answer to this is to just differentiate symbolically using polynomial differentiation rules. But I think a better answer is to remember that we’re using a different metric than usual, and Z isn’t discrete at all! Indeed, for any number k, limn→∞k+pn=k, so no points are isolated, and we can define differentiation of functions on Z in exactly the usual way with limits.
The theorem: where k is relatively prime to an odd prime p and n<e, k⋅pn is a square mod pe iff k is a square mod p and n is even.
The real meat of the theorem is the n=0 case (i.e. a square mod p that isn’t a multiple of p is also a square mod pe. Deriving the general case from there should be fairly straightforward, so let’s focus on this special case.
Why is it true? This question has a surprising answer: Newton’s method for finding roots of functions. Specifically, we want to find a root of f(x):=x2−k, except in Z/peZ instead of R.
To adapt Newton’s method to work in this situation, we’ll need the p-adic absolute value on Z: |k⋅pn|p:=p−n for k relatively prime to p. This has lots of properties that you should expect of an “absolute value”: it’s positive (|x|p≥0 with = only when x=0), multiplicative (|xy|p=|x|p|y|p), symmetric (|−x|p=|x|p), and satisfies a triangle inequality (|x+y|p≤|x|p+|y|p; in fact, we get more in this case: |x+y|p≤max(|x|p,|y|p)). Because of positivity, symmetry, and the triangle inequality, the p-adic absolute value induces a metric (in fact, ultrametric, because of the strong version of the triangle inequality) d(x,y):=|x−y|p. To visualize this distance function, draw p giant circles, and sort integers into circles based on their value mod p. Then draw p smaller circles inside each of those giant circles, and sort the integers in the big circle into the smaller circles based on their value mod p2. Then draw p even smaller circles inside each of those, and sort based on value mod p3, and so on. The distance between two numbers corresponds to the size of the smallest circle encompassing both of them. Note that, in this metric, 1,p,p2,p3,... converges to 0.
Now on to Newton’s method: if k is a square mod p, let a be one of its square roots mod p. |f(a)|p≤p−1; that is, a is somewhat close to being a root of f with respect to the p-adic absolute value. f′(x)=2x, so |f'(a)|p=|2a|p=|2|p⋅|a|p=1⋅1=1; that is, f is steep near a. This is good, because starting close to a root and the slope of the function being steep enough are things that helps Newton’s method converge; in general, it might bounce around chaotically instead. Specifically, It turns out that, in this case, |f(a))|p<|f'(a)|p is exactly the right sense of being close enough to a root with steep enough slope for Newton’s method to work.
Now, Newton’s method says that, from a, you should go to a1:=a−f(a)f'(a)=a−a2−k2a. 2a is invertible mod pe, so we can do this. Now here’s the kicker: f(a1)=(a−a2−k2a)2−k=a2−2a(a2−k2a)+(a2−k2a)2−k=(a2−k)2(2a)2, so |f(a1)|p=|a2−k|2p|2a|2p=|a2−k|2p<|a2−k|p. That is, a1 is closer to being a root of f than a is. Now we can just iterate this process until we reach ai with |f(ai)|p≤p−e, and we’ve found our square root of k mod pe.
Exercise: Do the same thing with cube roots. Then with roots of arbitrary polynomials.
The part about derivatives might have seemed a little odd. After all, you might think, Z is a discrete set, so what does it mean to take derivatives of functions on it. One answer to this is to just differentiate symbolically using polynomial differentiation rules. But I think a better answer is to remember that we’re using a different metric than usual, and Z isn’t discrete at all! Indeed, for any number k, limn→∞k+pn=k, so no points are isolated, and we can define differentiation of functions on Z in exactly the usual way with limits.