This is sometimes true in functional programming, but only if you’re careful.
I think this overstates the difficulty, referential transparency is the norm in functional programming, not something unusual.
For example, suppose the expression is a function call, and you change the function’s definition and restart your program. When that happens, you need to delete the out-of-date entries from the cache or your program will read an out-of-date answer.
As I understand, this system is mostly useful if you’re using it for almost every function. In that case, your inputs are hashes which contain the source code of the function that generated them, and therefore your caches will invalidate if an upstream function’s source code changed.
Also, since you’re using the text of an expression for the cache key, you should only use expressions that don’t refer to any local variables.
Agreed.
So this might be okay in simple cases when you are working alone and know what you’re doing, but it likely would result in confusion when working on a team.
I agree that it’s essentially a framework, and you’d need buy-in from a team in order to consistently use it in a repository. But I’ve seen teams buy into heavier frameworks pretty regularly; this version seems unusual but not particularly hard to use/understand. It’s worth noting that bad caching systems are pretty common in data science, so something like this is potentially a big improvement there.
I think this overstates the difficulty, referential transparency is the norm in functional programming, not something unusual.
It really depends on what your domain you’re working in. If you’re memoizing functions, you’re not allowed to use the following things (or rather, you can only use them in functions that are not transitively called by memoized functions):
Global mutable state (to no-one’s surprise)
A database, which is global mutable state
IO, including reading user input, fetching something non-static from the web, or logging
Networking with another service that has state
Getting the current date
Ask a programmer to obey this list of restrictions, and—depending on the domain they’re working in—they’ll either say “ok” or “wait what that’s most of what my code does”.
As I understand, this system is mostly useful if you’re using it for almost every function. In that case, your inputs are hashes which contain the source code of the function that generated them, and therefore your caches will invalidate if an upstream function’s source code changed.
That’s very clever! I don’t think it’s sufficient, though.
You run it again, which invokes (add2 100), which is found in the cache to be 120. The add2 cache entry is not invalidated because the add2 function has not changed, nor has its inputs. The add1 cache entries would be invalidated if anything ever invoked add1, but nothing does.
(This is what I meant by “You also have to look at the functions it calls (and the functions those call, etc.)” in my other comment.)
I think this is fixable. An invocation (f expr1 expr2) will produce the same result as the last time you invoked it if:
The body of f is the same as last time.
Every function it calls, including transitively, has the same source code as the last time you called f. Also every macro and type definition that is used transitively. Basically any code that it depends on in any way.
Every function involved is pure (no state, no IO).
Every function involved is top-level. I’m not sure this will play well with higher-order functions.
The invocations expr1 and expr2 also obey this checklist.
I’m not sure this list is exhaustive, but it should be do-able in principle. If I look at a function invocation and all the code it transitively depends on (say it’s 50% of the codebase), and I know that that 50% of the codebase hasn’t changed since last time you ran the program, and I see that that 50% of the codebase is pure, and I trust you that the other 50% of the codebase doesn’t muck with it (as it very well could with e.g. macros), then that function invocation should produce the same result as last time.
This is tricky enough that it might need language level support to be practical. I’m glad that Isusr is thinking of it as “writing a compiler”.
I think this overstates the difficulty, referential transparency is the norm in functional programming, not something unusual.
As I understand, this system is mostly useful if you’re using it for almost every function. In that case, your inputs are hashes which contain the source code of the function that generated them, and therefore your caches will invalidate if an upstream function’s source code changed.
Agreed.
I agree that it’s essentially a framework, and you’d need buy-in from a team in order to consistently use it in a repository. But I’ve seen teams buy into heavier frameworks pretty regularly; this version seems unusual but not particularly hard to use/understand. It’s worth noting that bad caching systems are pretty common in data science, so something like this is potentially a big improvement there.
It really depends on what your domain you’re working in. If you’re memoizing functions, you’re not allowed to use the following things (or rather, you can only use them in functions that are not transitively called by memoized functions):
Global mutable state (to no-one’s surprise)
A database, which is global mutable state
IO, including reading user input, fetching something non-static from the web, or logging
Networking with another service that has state
Getting the current date
Ask a programmer to obey this list of restrictions, and—depending on the domain they’re working in—they’ll either say “ok” or “wait what that’s most of what my code does”.
That’s very clever! I don’t think it’s sufficient, though.
For example, say you have this code:
You run it once and get this cache:
You fix the first function:
You run it again, which invokes
(add2 100)
, which is found in the cache to be 120. Theadd2
cache entry is not invalidated because theadd2
function has not changed, nor has its inputs. Theadd1
cache entries would be invalidated if anything ever invokedadd1
, but nothing does.(This is what I meant by “You also have to look at the functions it calls (and the functions those call, etc.)” in my other comment.)
Ah, that’s a great example, thanks for spelling it out.
I think this is fixable. An invocation
(f expr1 expr2)
will produce the same result as the last time you invoked it if:The body of
f
is the same as last time.Every function it calls, including transitively, has the same source code as the last time you called
f
. Also every macro and type definition that is used transitively. Basically any code that it depends on in any way.Every function involved is pure (no state, no IO).
Every function involved is top-level. I’m not sure this will play well with higher-order functions.
The invocations
expr1
andexpr2
also obey this checklist.I’m not sure this list is exhaustive, but it should be do-able in principle. If I look at a function invocation and all the code it transitively depends on (say it’s 50% of the codebase), and I know that that 50% of the codebase hasn’t changed since last time you ran the program, and I see that that 50% of the codebase is pure, and I trust you that the other 50% of the codebase doesn’t muck with it (as it very well could with e.g. macros), then that function invocation should produce the same result as last time.
This is tricky enough that it might need language level support to be practical. I’m glad that Isusr is thinking of it as “writing a compiler”.