Is it common to use Kalman filters for things that have nonlinear transformations, by approximating the posterior with a Gaussian (eg. calculating the closest Gaussian distribution to the true posterior by JS-divergence or the like)? How well would that work?
Grammar comment—you seem to have accidentally a few words at
Measuring multiple quantities: what if we want to measure two or more quantities, such as temperature and humidity? Furthermore, we might know that these are [missing words?] Then we now have multivariate normal distributions.
There are a number of Kalman-like things you can do when your updates are nonlinear.
The “extended Kalman filter” uses a local linear approximation to the update. There are higher-order versions. The EKF unsurprisingly tends to do badly when the update is substantially nonlinear. The “unscented Kalman filter” uses (kinda) a finite-difference approximation instead of the derivative, deliberately taking points that aren’t super-close together to get an approximation that’s meaningful on the scale of your actual uncertainty. Going further in that direction you get “particle filters” which represent your uncertainty not as a Gaussian but by a big pile of samples from its distribution. (There’s a ton of lore on all this stuff. I am in no way an expert on it.)
Good post!
Is it common to use Kalman filters for things that have nonlinear transformations, by approximating the posterior with a Gaussian (eg. calculating the closest Gaussian distribution to the true posterior by JS-divergence or the like)? How well would that work?
Grammar comment—you seem to have accidentally a few words at
There are a number of Kalman-like things you can do when your updates are nonlinear.
The “extended Kalman filter” uses a local linear approximation to the update. There are higher-order versions. The EKF unsurprisingly tends to do badly when the update is substantially nonlinear. The “unscented Kalman filter” uses (kinda) a finite-difference approximation instead of the derivative, deliberately taking points that aren’t super-close together to get an approximation that’s meaningful on the scale of your actual uncertainty. Going further in that direction you get “particle filters” which represent your uncertainty not as a Gaussian but by a big pile of samples from its distribution. (There’s a ton of lore on all this stuff. I am in no way an expert on it.)
Thanks! Edited.