In calculus, the product rule says ddx(f⋅g)=f′⋅g+f⋅g′. The fundamental theorem of calculus says that the Riemann integral acts as the anti-derivative.[1] Combining these two facts, we derive integration by parts:
It turns out that we can use these two properties to generalize the derivative to match some of our intuitions on edge cases. Let’s think about the absolute value function:
Image from Wikipedia
The boring old normal derivative isn’t defined at x=0, but it seems like it’d make sense to be able to say that the derivative is eg 0. Why might this make sense?
Taylor’s theorem (and its generalizations) characterize first derivatives as tangent lines with slope L:=f′(x0) which provide good local approximations of f around x0: f(x)≈f(x0)+L(x−x0). You can prove that this is the best approximation you can get using only f(x0) and L! In the absolute value example, defining the “derivative” to be zero at x=0 would minimize approximation error on average in neighborhoods around the origin.
In multivariable calculus, the Jacobian is a tangent plane which again minimizes approximation error (with respect to the Euclidean distance, usually) in neighborhoods around the function. That is, having a first derivative means that the function can be locally approximated by a linear map. It’s like a piece of paper that you glue onto the point in question.
Complex analysis provides another perspective on why this might make sense, but I think you get the idea and I’ll omit that for now.
We can define a weaker notion of differentiability which lets us do this – in fact, it lets us define the weak derivative to be anything at x=0! Now that I’ve given some motivation, here’s a great explanation of how weak derivatives arise from the criterion of “satisfy integration by parts for all relevant functions”.
As far as I can tell, the indefinite Riemann integral being the anti-derivative means that it’s the inverse of ddx in the group theoretic sense – with respect to composition in the R-vector space of operators on real-valued functions. You might not expect this, because ∫ maps an integrable function f to a set of functions {F+C|C∈R}. However, this doesn’t mean that the inverse isn’t unique (as it must be), because the inverse is in operator-space.
The reason f′(0) is undefined for the absolute value function is that you need the value to be the same for all sequences converging to 0 – both from the left and from the right. There’s a nice way to motivate this in higher-dimensional settings by thinking about the action of e.g. complex multiplication, but this is a much stronger notion than real differentiability and I’m not quite sure how to think about motivating the single-valued real case yet. Of course, you can say things like “the theorems just work out nicer if you require both the lower and upper limits be the same”...
Weak derivatives
In calculus, the product rule says ddx(f⋅g)=f′⋅g+f⋅g′. The fundamental theorem of calculus says that the Riemann integral acts as the anti-derivative.[1] Combining these two facts, we derive integration by parts:
ddx(F⋅G)=f⋅G+F⋅g∫ddx(F⋅G)dx=∫f⋅G+F⋅gdxF⋅G−∫F⋅gdx=∫f⋅Gdx.
It turns out that we can use these two properties to generalize the derivative to match some of our intuitions on edge cases. Let’s think about the absolute value function:
The boring old normal derivative isn’t defined at x=0, but it seems like it’d make sense to be able to say that the derivative is eg 0. Why might this make sense?
Taylor’s theorem (and its generalizations) characterize first derivatives as tangent lines with slope L:=f′(x0) which provide good local approximations of f around x0: f(x)≈f(x0)+L(x−x0). You can prove that this is the best approximation you can get using only f(x0) and L! In the absolute value example, defining the “derivative” to be zero at x=0 would minimize approximation error on average in neighborhoods around the origin.
In multivariable calculus, the Jacobian is a tangent plane which again minimizes approximation error (with respect to the Euclidean distance, usually) in neighborhoods around the function. That is, having a first derivative means that the function can be locally approximated by a linear map. It’s like a piece of paper that you glue onto the point in question.
This reasoning even generalizes to the infinite-dimensional case with functional derivatives (see my recent functional analysis textbook review). All of these cases are instances of the Fréchet derivative.
Complex analysis provides another perspective on why this might make sense, but I think you get the idea and I’ll omit that for now.
We can define a weaker notion of differentiability which lets us do this – in fact, it lets us define the weak derivative to be anything at x=0! Now that I’ve given some motivation, here’s a great explanation of how weak derivatives arise from the criterion of “satisfy integration by parts for all relevant functions”.
As far as I can tell, the indefinite Riemann integral being the anti-derivative means that it’s the inverse of ddx in the group theoretic sense – with respect to composition in the R-vector space of operators on real-valued functions. You might not expect this, because ∫ maps an integrable function f to a set of functions {F+C|C∈R}. However, this doesn’t mean that the inverse isn’t unique (as it must be), because the inverse is in operator-space.
The reason f′(0) is undefined for the absolute value function is that you need the value to be the same for all sequences converging to 0 – both from the left and from the right. There’s a nice way to motivate this in higher-dimensional settings by thinking about the action of e.g. complex multiplication, but this is a much stronger notion than real differentiability and I’m not quite sure how to think about motivating the single-valued real case yet. Of course, you can say things like “the theorems just work out nicer if you require both the lower and upper limits be the same”...