True numbers and fake numbers
In physical science the first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.
-- Lord Kelvin
If you believe that science is about describing things mathematically, you can fall into a strange sort of trap where you come up with some numerical quantity, discover interesting facts about it, use it to analyze real-world situations—but never actually get around to measuring it. I call such things “theoretical quantities” or “fake numbers”, as opposed to “measurable quantities” or “true numbers”.
An example of a “true number” is mass. We can measure the mass of a person or a car, and we use these values in engineering all the time. An example of a “fake number” is utility. I’ve never seen a concrete utility value used anywhere, though I always hear about nice mathematical laws that it must obey.
The difference is not just about units of measurement. In economics you can see fake numbers happily coexisting with true numbers using the same units. Price is a true number measured in dollars, and you see concrete values and graphs everywhere. “Consumer surplus” is also measured in dollars, but good luck calculating the consumer surplus of a single cheeseburger, never mind drawing a graph of aggregate consumer surplus for the US! If you ask five economists to calculate it, you’ll get five different indirect estimates, and it’s not obvious that there’s a true number to be measured in the first place.
Another example of a fake number is “complexity” or “maintainability” in software engineering. Sure, people have proposed different methods of measuring it. But if they were measuring a true number, I’d expect them to agree to the 3rd decimal place, which they don’t :-) The existence of multiple measuring methods that give the same result is one of the differences between a true number and a fake one. Another sign is what happens when two of these methods disagree: do people say that they’re both equally valid, or do they insist that one must be wrong and try to find the error?
It’s certainly possible to improve something without measuring it. You can learn to play the piano pretty well without quantifying your progress. But we should probably try harder to find measurable components of “intelligence”, “rationality”, “productivity” and other such things, because we’d be better at improving them if we had true numbers in our hands.
- 8 Jul 2017 9:17 UTC; 0 points) 's comment on Epistemic Laws of Motion by (
- 23 Feb 2014 18:03 UTC; 0 points) 's comment on Beware garblejargon by (
I think there is a tale to tell about the consumer surplus and it goes like this.
Alice loves widgets. She would pay $100 for a widget. She goes on line and finds Bob offering widgets for sale for $100. Err, that is not really what she had in mind. She imagined paying $30 for a widget, and feeling $70 better off as a consequence. She emails Bob: How about $90?
Bob feels like giving up altogether. It takes him ten hours to hand craft a widget and the minimum wage where he lives is $10 an hour. He was offering widgets for $150. $100 is the absolute minimum. Bob replies: No.
While Alice is deciding whether to pay $100 for a widget that is only worth $100 to her, Carol puts the finishing touches to her widget making machine. At the press of a button Carol can produce a widget for only $10. She activates her website, offering widgets for $40. Alice orders one at once.
How would Eve the economist like to analyse this? She would like to identify a consumer surplus of 100 − 40 = 60 dollars, and a producer surplus of 40 − 10 = 30 dollars, for a total gain from trade of 60 + 30 = 90 dollars. But before she can do this she has to telephone Alice and Carol and find out the secret numbers, $100 and $10. Only the market price of $40 is overt.
Alice thinks Eve is spying for Carol. If Carol learns that Alice is willing to pay $100, she will up the price to $80. So Alice bullshits Eve: Yeh, I’m regretting my purchase, I’ve rushed to buy a widget, but what’s it worth really? $35. I’ve over paid.
Carol thinks Eve is spying for Alice. If Alice learns that they only cost $10 to make, then she will bargain Carol down to $20. Carol bullshits Eve: Currently they cost me $45 to make, but if I can grow volumes I’ll get a bulk discount on raw materials and I hope to be making them for $35 and be in profit by 2016.
Eve realises that she isn’t going to be able to get the numbers she needs, so she values the trade at its market price and declares GDP to be $40. It is what economist do. It is the desperate expedient to which the opacity of business has reduced them.
Now for the twist in the tale. Carol presses the button on her widget making machine, which catches fire and is destroyed. Carol gives up widget making. Alice buys from Bob for $100. Neither is happy with the deal; the total of consumer surplus and producer surplus is zero. Alice is thinking that she would have been happier spending her $100 eating out. Bob is thinking that he would have had a nicer time earning his $100 waiting tables for 10 hours.
Eve revises her GDP estimate. She has committed herself to market prices, so it is up 150% at $100. Err, that is not what is supposed to happen. Vital machinery is lost in a fire, prices soar and goods are produced by tedious manual labour, the economy has gone to shit, producing no surplus instead of producing a $90 surplus. But Eve’s figures make this look good.
I agree that there is a problem with the consumer surplus. It is too hard to discover. But the market price is actually irrelevant. Going with the number you can get, even though it doesn’t relate to what you want to know is another kind of fake, in some ways worse.
Disclaimer: I’m not an economist. Corrections welcomed.
If that job’s available, why doesn’t he do it instead? If it’s not, what’s the point of focusing on his wishing—he might as well wish he were a millionaire.
The missing detail in your story is what Bob did to earn money while Carol’s machine was working. If he was doing something better than hand-making widgets, he wouldn’t go back to widgetry unless he could sell at a higher price. And if he was doing something less good than making widgets, he’s happy that Carol’s machine burned down.
Another point is that if Carol’s machine can make widgets more cheaply than Bob, then it might make more them, satisfying more market demand. This should cause GDP to rise since it multiplies items sold by price. How common is the case of very inelastic demand (if that’s the right term)?
These points probably shouldn’t change your conclusion that GDP is often a bad measure.
Disclaimer: I’m even less of an economist than you are.
You’ll need to fix the start of your post: (emphasis mine):
That’s exactly what she had in mind since she would pay $100. I think it’s better to change it to “Alice would pay $95 for a widget”.
A couple of other points. Eve’s life is very slightly easier since prices of widgets change which means she can estimate part of the demand curve and then estimate the consumer surplus from there. Also, GDP and what it measures has nothing to do with the consumer surplus.
Perhaps she’d be willing to pay $100 for a widget if there were no other option, but would nonetheless prefer to pay less if that can be arranged.
Transaction and information costs are a huge problem. People spend a lot of time paying them in actual life — for instance, driving to work or to the store; standing in lines; searching for bargains; clipping coupons; and so on.
Alice is willing to incur an email round-trip time to distinguish a world where $100 widgets are the only offer from a world in which a $90 widget is also available. She considers that the delay of one round-trip of haggling is worth $10 times the probability of a lower offer existing.
(Other factors obtain, too, like minimizing regret — if she bought a widget for $100 and then immediately saw Bob sell one to Faye for $90, she’d feel like $10 worth of fool.)
Some of your fake numbers fall out of the common practice of shoehorning a partial order into the number line. Suppose you have some quality Foo relative to which you can compare, in a somewhat well-defined manner in at least some cases. For example, Foo = desirable: X is definitely more desirable to me than Y, or Foo = complex: software project A is definitely more complex than software project B. It’s not necessarily the case that any X and Y are comparable. It’s then tempting to invent a numerical notion of Foo-ness, and assign numerical values of Foo-ness to all things in such a way that your intuitive Foo-wise comparisons hold. The values turn out to be essentially arbitrary on their own, only their relative order is important.
(In mathematical terms, you have a finite (in practice) partially ordered set which you can always order-embed into the integers; if the set is dynamically growing, it’s even more convenient to order-embed it into the reals so you can always find intermediate values between existing ones).
After this process, you end up with a linear order, so any X and Y are always comparable. It’s easy to forget that this may not have been the case in your intuition when you started out, because these new comparisons do not contradict any older comparisons that you held. If you had no firm opinion on comparative utility of eating ice-cream versus solving a crossword, it doesn’t seem a huge travesty that both activities now get a specific utility value and one of them outranks the other.
The advantages of this process are that Foo-ness is now a specific thing that can be estimated, used in calculations, reasoned about more flexibly, etc. The disadvantages, as you describe, are that the numbers are “fake”, they’re really just psychologically convenient markers in a linear order; and the enforced linearity may mislead you into oversimplifying the phenomenon and failing to investigate why the real partial order is the way it is.
Nitpick: utility is not just an ordering, it also has affine structure (relative intervals are preserved) because of preferences over lotteries. Software complexity is a valid example of your point, though. It’s like trying to measure the “largeness” of a thing without specifying whether we mean weight, volume, surface area, or something else.
Only among ideal agents, not among apes.
That is one version of the statement that human utility isn’t well defined, i.e., cousin_it’s point.
Hmm. It seems to me that your statement is directly contrary to cousin_it’s; if X isn’t well-defined, X can’t have an affine structure.
Yes, and it seems you agree that human utility doesn’t have an affine structure.
I am not convinced that preferences over lotteries are very meaningful; in particular, I am unconvinced that they are much related to preferences over outcomes.
Edit: Whoever downvoted: I’d like to hear arguments to convince me. If you have some, or know of some, please share. I’ve not seen any convincing ones. I have read the Sequences, and Rational Choice in an Uncertain World, and taken classes in statistics, etc., so I’m not just being contrary on the basis of nothing.
The argument is the Von Neumann–Morgenstern utility theorem:
I’m familiar with the VNM axioms.
This is false, because no one is obligated to agree to anything. If my preferences are such that they in some sense add up to a Dutch book, but then you actually offer me a bet (or set of bets, simultaneous or sequential) that constitute a Dutch book, you know what I can say?
“No. I decline the bet.”
Edit: Also, what if your values have incomparable quantities?
EDIT: I retract the claims in this comment. Given the revision made in the children they do not apply.
No, you aren’t. You may have heard of them but when you chose to start making claims about them you demonstrated that you do not know what they are. In particular:
None of the four axioms being discussed consist of or rely on that assumption. In fact, the whole point of Von Neumann–Morgenstern utility theorem is that “has a utility function” is a property that can be concluded of any agent that meets those four axioms. If the unitary utility was assumed then the theory wouldn’t need to exist.
You may, of course, deny that one of the axioms applies to you or you may deny that one of the axioms applies to rational agents. However, you will have difficulty persuading people of your controversial position after having already demonstrated your unfamiliarity with the subject matter.
Ok, it’s possible I’ve misunderstood. To see if I have, clarify something for me, please:
How would you represent the valuing of another agent’s values, using the VNM theorem? That is, let’s say I assign utility to (certain) other people having lots of utility. How would this be represented?
Edit: You know what, while the above question is still interesting, having reread the thread, I actually see the issue now, and it’s simpler. This line:
is indeed a misstatement (as it stands it is indeed incorrect for the reasons you state). It should be:
“Accepting the VNM axioms requires you to assume that everything can be reduced to a unitary “utility”.” (Which is to say, if you accept the axioms, you will be forced to conclude this; and also, assuming this leads you to the VNM axioms.)
If you find that reducing everything to a unitary utility then fails to describe your preferences over outcomes, you have a problem.
With the minor errata that ‘assume’ would best be replaced with ‘conclude’, ‘believe’ or ‘accept’ this revision seems accurate. For someone taking your position the most interesting thing about the VNM theory is that it prompts you to work out just which of the axioms you reject. One man’s modus ponens is another man’s modus tollens. The theory doesn’t care whether it is being used to conclude acceptance of the conclusion or rejection of one or more of the axioms.
Entirely agree. Humans, for example, are not remotely VNM coherent.
I have retracted my criticism via edit. One misstatement does not unfamiliarity make so even prior to your revision I suspect my criticism was overstated. Pardon me.
Thank you, and no offense taken.
Right. And the thing is, that if one were to argue that humans are thereby irrational, I would disagree. (Which is to say, I would not assent to defining rationality as constituting, or necessarily containing, adherence to VNM.)
Indeed. Incidentally, I suspect the axiom I would end up rejecting is continuity (axiom 3), but don’t quote me on that; I have to get my copy of Rational Choice in an Uncertain World out of storage (as I recall said book explains the implications of the VNM axions quite well and I distinctly recall that my objections to VNM arose when reading it).
I tentatively agree. The decision system I tend toward modelling an idealised me as having contains an extra level of abstraction in order to generalise the VNM axioms and decision theory regarding utility maximisation principles to something that does allow the kind of system you are advocating (and which I don’t consider intrinsically irrational).
Simply put, if instead of having preferences for world-histories you have preferences for probability distributions of world-histories then doing the same math and reasoning gives you an entirely different but still clearly defined and abstractly-consequentialist way of interacting with lotteries. It means the agent is doing a different thing than maximising the mean of utility… it could, in effect, be maximising the mean subject to satisficing on a maximum probability of utility below a value.
It’s the way being inherently and coherently risk-averse (and similar non-mean optimisers) would work.
Such agents are coherent. It doesn’t matter much whether we call them irrational or not. If that is what they want to do then so be it.
That does seem to be the most likely axiom being rejected. At least that has been my intuition when I’ve considered how plausible not ‘expected’ utility maximisers seem to think.
You’re right, that question does seem interesting. Let me see...
I only ever apply values to entire world histories[1]. ie. Consider the entire wavefunction of the universe, which includes all of space, all of time, all Everett branches[2] and so forth. Different possible configurations of that universe are preferred over others on a basis that is entirely arbitrary. It so happens that my preferences over world histories do depend somewhat on computations about how the state of certain other people’s brains at certain times compares to the rest of the configuration of that world history. This preference is not different in nature to the preferring histories which do not have lots of copies wedrifid tortured for billions of years.
It also applies whether or not the other people I have altruistic preferences about happen to have utility functions at all. That’d probably make the math easier and the preference-preferences easier to instantiate but it isn’t necessary. Mind you I don’t necessarily care about all components of what make up their ‘utility function’ equally. I could perhaps assign negative weight to or ignore certain aspects of it on the basis of what caused those preferences.
Translating how strongly I prefer one history over another into a utility function occurs by the normal mechanism (ie. “require ‘VNM’; wedrifid.preferences.to_utility_function”. The altruistic values issue is orthogonal to the having-a-utility-function issue.
Of course, in practice I rely on and discuss much simpler things but this is from the perspective of considering the simpler models to be approximations of and simplifications of world-history preferences.
Ignore the branches part if you don’t believe in those—the difference isn’t of direct importance to the immediate question even though it has tangential relevance to your overall position.
The world is uncertain; all actions are gambles; refusing to choose an action is like refusing to let time to pass.
But let’s pop up a level here and make sure we aren’t arguing about different things. Humans clearly aren’t rational, and don’t follow the VNM axioms. Are you arguing just that VNM axioms aren’t a good model for people, or arguing that VNM axioms aren’t a good model for rational agents?
Refusing to choose an action is not the same as taking a bet. If you are claiming that passively doing nothing can be equivalent to taking a Dutch Book bet, then I say to you what I said to shminux:
Concrete real-world example, please.
The latter. (The former is also true, of course.)
The VNM axioms assume that everything can be reduced to a unitary “utility”. If this isn’t the case, then you have a problem.
Yes, it is. You chose to do nothing, and doing nothing has consequences. You can’t keep the bomb from going off by not choosing which wire to cut.
How does that result in me being Dutch-booked, though?
I can’t construct one without straw manning a set of preferences that violate the VNM axioms. Give me preferences and I can construct an example.
That’s the conclusion of the theorem. Which of the premise do you disagree with?
By all means, straw man away. I won’t take it personally.
However, for an example — and please don’t feel compelled to restrict your answer only to this — there’s a variant of the good old chickens/grandmother example:
Let us say I prefer the nonextinction of chickens to their extinction (that is, I would choose not to murder all chickens, or any chickens, all else being equal). I also prefer my grandmother remaining alive to my grandmother dying. Finally, I prefer the deaths of arbitrary numbers of chickens, taking place with any probability, to any probability of my grandmother dying.
I believe this violates VNM. (Am I wrong there?) At least, I don’t see how one would construe these preferences as those of an agent who is acting to maximize a quantity.
See this thread for clarification/correction.
Those preferences do not violate the VNM axioms. The willingness to kill chickens in order to eliminate an arbitrary small chance of the death of your grandmother makes the preferences a bit weird, but still VNM compliant.
A live chicken might quantum tunnel from China to your grandmother’s house and eat all her heart medication. But then again, a quantum tunneling chicken could save your grandmother from tripping on fallen corn-seed. The net effect of chickens on your grandmother might be unfathomably small, but it is unlikely to ever be zero. If chickens never have zero effect on your grandmother, then your preference for the non-extinction of chickens would never apply *EDIT and so the only preferences we would need to consider would be of your grandmother’s life, which could be represented with utilities (say 0 for dead and 1 for alive).
If you’re willing to tolerate a 1⁄10,000,000 increase in chance of your grandmother’s death to save chickens from extinction, you sill have VNM-rational preferences. Here’s a utility function for that:
dead-chicken-dead-grandma 0
live-chicken-dead-grandma 1
dead-chicken-live-grandma 10,000,000
live-chicken-live-grandma 10,000,001
*Or perhaps not so small. 90% of all flu deaths happen to people age 65 or older, and chickens are the main reservoir of influenza.
I am not willing, no. Still VNM-compliant?
Yep. But you probably don’t care about chickens.
If, by some cosmic happenstance the effect of a chicken’s life or death affected your grandmother exactly zero—no gravitational effects of the Australian chicken’s beak on your grandmother’s heart, no nothin’—then the utilities get more complicated. If the smallest possible non-zero probability a chicken could kill or save your grandmother were 10^-2,321,832,934,903, then the utilities could be something like:
dead-chicken-dead-grandma 0
live-chicken-dead-grandma 1
dead-chicken-live-grandma 10^2,321,832,934,903 + 1
live-chicken-live-grandma 10^-2,321,832,934,903 + 2
It seems more likely that he really isn’t VNM-compliant. Chickens are tasty and nutritious, 1⁄1,000,000 is a small number and lets face it, grandparents of adults are already old and have much more chance than that of dying every day. It would be surprising if Said is so perfectly indifferent to chickens, especially since he’s already been explicitly telling us that he isn’t VNM-compliant.
If you are making a theoretical claim about models for rational agents then you are not entitled to that particular proof.
What I was asking for was an existence proof (i.e., an example) for a claim being made about how the theory plays out in the real world. For such a claim, I most certainly am entitled to that particular proof.
I repeat the assertion from the grandparent. You made this demand multiple times in support of a point that you explicitly declared was about the behaviour of theoretical rational agents in the abstract. Please see the quotes provided in the grandparent. You are not entitled to real-world existence proofs.
If you wish to weaken your claim such that it only applies to humans in the real world then your demand becomes coherent. As it stands it is a non sequitur and logically rude.
I requested an existence proof in response to a claim. The fact that I was making a point in the same post is irrelevant. Any claim I was making is irrelevant.
I don’t think that this is a valid escape clause. You don’t normally know that there is a bet going on. You just live your life, make small decisions every day, evaluating risks and rewards. It works just fine as long as no one is on your case. But if someone who knows of your vulnerability to Dutch booking can nudge the situations you find yourself in in the desired direction, you eventually end up back where you started, but with, say some of your money inexplicably lost. And then it happens again, and again. And you are powerless to change the situation, since the decisions still have to be made, or else you would be lying in bed all day waiting for the end (which might be what the adversary intended, anyway).
We discussed that a few days ago. As far as I know, De Finetti’s justification of probabilities only works in the exact scenario where agents must publish their beliefs and can’t refuse either side of the bet. I’d love to see a version for more general scenarios, of course.
Concrete real-world example, please.
In what sense are preferences over lotteries less meaningful than preferences over pure outcomes?
I don’t think you can determine preferences over outcomes from preferences over lotteries; or if you can, then it’s nowhere near as straightforward as multiplying by probabilities, etc. To put it another way, I don’t think preferences over lotteries represent preferences over outcomes in any meaningful way.
On this topic: http://en.wikipedia.org/wiki/Level_of_measurement
Assuming you broadly subscribe to the notion of “true numbers” and “fake numbers”, how do you classify the following?
Food calories [pollid:595]
The position of an object’s centre of mass [pollid:596]
The equilibrium price [pollid:597]
A population’s carrying capacity [pollid:598]
The population mean [pollid:599]
Some anecdotal and meandering gubbins.
I am occasionally called upon to defend what an interlocutor thinks of as a “fake number”. The way I typically do this is to think of some other measure, parameter, or abstraction my interlocutor doesn’t think is “fake”, but has analogous characteristics to the “fake number” in question, and proceed with an argument from parallel reasoning.
(Centre of mass is actually a very good go-to candidate for this, because most people are satisfied it’s a “real”, “physical” thing, but have their intuitions violated when they discover their centre of mass can exist outside of their body volume. If this isn’t obvious, try and visualise the centre of mass of a toroid.)
Some of the above are this sort of parallel measure I have used in the past, either with each other or with some other measures and values mentioned in comments on this post. I’m quite pleased at how divisive some of them are, though I’m surprised at the near-unanimity of food calories, which I would have expected to be more politicised.
My conclusions to date have been that the “reality” of a measure has a strong psychological component, which is strongly formed by peoples’ intuitions about, and exposure to, abstract concepts with robust, well-understood or useful behaviour.
How is the fact that the center of mass can exist outside the body volume supposed to make center of mass “fake” in any way shape or form?
To clarify, my position is that whether someone finds an abstract measure to be “fake” or not depends on how comfortable they are with abstractions. Some abstractions wear very concrete disguises, and people are generally fine with these.
My experience is that people’s folk-physics interpretation of their centre of mass is that it’s an actual part of their body that the rest of their body moves around. A lot of dancers, for example, will talk about their “core”, their “centre” and their “centre of gravity” as interchangeable concepts. When confronted with the idea that it’s an abstraction which can be located outside of their body, they’re often forced to concede that this intangible thing is nonetheless an important and useful concept. If that’s true of centre of mass, maybe they should think a little bit harder about market equilibria or standard deviation.
As Dan pointed out here cousin_it’s definition of “fake number” has a very concrete meaning. The most your example shows is that the folk-physics notion “center of gravity” as opposed to the actual physics notion is a fake number.
While his definition of “true number” is fairly concrete, his definition of “fake number” is less so, and importantly is not disjoint with “true number”.
ETA: Having thought about it a bit and looked over it, I’m fairly sure we’re just talking past each other and there’s no coherent point of dispute in this discussion. I suggest we stop having it.
Some of these things have formal, mathematical or empirical definitions and that’s the gold standard for a true number. I don’t want to point out which ones, as that would ruin the poll.
Why do those criteria form a gold standard for a “true” number? I can invent formal, mathematical or empirical definitions for things all day long that don’t correspond to anything useful or meaningful or remotely “real” outside of the most tautological sense.
Let’s say the Woowah of a sample is the log of its mean minus the lowest number in the sample. If I sample the heights of fifty men aged 18-30, that sample has a mathematically well-defined Woowah. There’s even an underlying true value for the Woowah of the population, but so what? When I talk about the Woowah, I’m not really talking about anything.
Meanwhile a population’s carrying capacity (currently the most “fake” number in the poll) isn’t something I can directly observe. I have to infer it through observation. (For all practical purposes I have to infer the Woowah through observation too, but in principle I could sample the entire population and find the true Woowah). I can’t directly measure the carrying capacity because it isn’t a direct property of the population. It’s a parameter in a model of the population which happens to refer to a property that population would have in a specific and probably counterfactual case. It’s questionable whether there is an underlying “true” population carrying capacity, but it’s definitely talking about something important and meaningful.
The OP’s definition of a “true” number isn’t that it’s useful, meaningful or corresponding to something “real”. It’s merely that it’s objectively measurable and actually measured..
But why is this an interesting property that’s worthy of consideration?
It’s a specific failure mode that’s useful to talk about because it might let us recognize real-world failure in some “false” numbers. That’s not intended to imply there aren’t other failure modes; it’s not a sufficient test for the quality of a ‘number’.
A formal or mathematical definition isn’t good enough if it’s inputs can’t be similarly computed.
That’s true. And only some of those things can be computed, and some are inexact. But I’m surprised that some people voted for ‘fake’ on every item except for the center of mass (only one vote for ‘fake’).
A while has passed since the poll, so I guess we can discus this now:
Food calories: the energy that human metabolism can free from a food. 1 food calorie = 4200 J. The definition is precise, and can be measured exactly with a calorimeter.
Digestion and metabolism can deviate from that definition. Wikipedia notes that “alterations in the structure of the material consumed can cause modifications in the amount of energy that can be derived from the food; i.e. caloric value depends on the surface area and volume of a food.” People also have individual metabolic differences. Still, it’s a useful unit of measure.
The position of an object’s centre of mass has a strict definition. To calculate it you need to know the distribution of mass in an object, but you do know it for many objects.
The population mean is defined by Wikipedia as equal to the arithmetic mean over the whole population—for a finite population. That seems simple enough that I’m confused that some people voted it was ‘false’.
It can be a fake number if we have no practical way to survey every member of the population.
Thank you for posting that poll! Also, here’s some ideas to tell if a quantity is closer to “true” or “fake”:
Can you find many real-world measurements of that quantity on the internet?
Are there multiple independent methods of measuring that quantity that all give the same result?
If two of these methods disagree, will people say that they’re equally valid, or will they say that one must be wrong and try to find the error?
I’m not personally sold on “true” and “fake” numbers, but I do think it’s going in an interesting direction.
Yet another is “productivity”. In fact, most of software engineering consists of discussions of fake numbers. :/ This article (pdf) discusses that rather nicely.
Yeah, that’s also a great example. Thanks!
Why is such precision required for something to count as a ‘measurable quantity’? Depending on how you do the measurements, measurements of (e.g.) prices don’t always agree to two decimal places, let alone three.
Sure, though IQ is already a triumph of psychometrics, and Stanovich is working on the first RQ (rationality quotient) test.
Not sure why all the obsession with ordering people from best to worst. Some people don’t think quickly on their feet, or are bad test takers, etc. Can’t we just look at what people have done?
If you can find a good way to put a number on “what people have done”, sure. But if not, I need a way to answer questions like “would giving people chlorine supplements make them more effective at achieving their goals”, and score on IQ test is a) easy to calculate b) at least somewhat correlated with e.g. doing well at one’s job.
That’s actually a bad reason (see the Streetlight Fallacy or the availability bias in general).
Well, once you look at things a bit more subtly, the reason doesn’t seem quite so bad anymore. There is a balance to be struck between the accuracy of the measure and the easiness of calculating it. The most accurate measure is useless if you can’t calculate it. So what’s wrong with using a less accurate measure (but it still works to some extent—you mustn’t look at a) isolation from b)) which you can, in fact, calculate?
The need for balance is a fair point. But then you should make decisions about what to measure on the basis of an explicit trade-off between “easier to get” and “more relevant”—effectively you are using an estimate for the quantity you are really interested in and so you need some support for the notion that your estimate is a reasonable one.
The thing that makes IQ tests work is that these variables correlate to each other.
Sure, but (a) correlation isn’t transitive, and (b) some very smart impressive people do poorly on metrics like IQ tests.
I think cumulative lifetime output is better than IQ. Yes the former isn’t directly comparable across cohorts, and there are a million other complications. But trivial metrics like IQ seem to me to be looking under a streetlight.
Google did some experiments on measurable ways to do interviews (puzzles, etc.) and found no effect on hire quality. Undue insistence on proxies over real parameters is a failure mode, imo.
Unsurprising due to range restriction—by the time you’re interviewing with Google, you’ve gone through tons of filters (especially if you’re a Stanford grad). This is the same reason that when people look at samples of elite scientists, IQ tends to not be as important a factor as one would expect—because they’re all smart—and other things like personality factors start to correlate more.
EDIT: this may be related to Spearman’s law of diminishing returns
I am just saying that for people who are capable of doing more than flipping burgers (which probably starts well before a single sigma out from the mean), we should just look at what they did.
This approach has the advantage of not counting highly the kind of people who may place well on tests, etc. due to good hardware, but who, due to poor habits or whatever other reason, end up not living up to their potential.
Similarly, this approach highlights that creative output is often not comparable. Is Van Gogh “better” than Shakespeare? A silly question.
I don’t disagree that IQ tests are useful for some things for folks within a sigma of the mean, and I also agree with the consensus that tests start to fail for smart folks, and we need better models then.
If the average IQ of LW is really around 140, then I think we should talk about the neat things we have done, and not the average IQ of LW. :)
Tests are often used to decide what to allow people to do, so you can’t rely on what they’ve done already. When testing applicants to college, they don’t often have a significant history of doing.
But they only hire at the top, so one would expect the subsequent performance of their hires to be little correlated with any sort of interview assessments.
Toy example: 0.8 correlation between two variables, select on one at 3 or more s.d.s above the mean, correlation within that subpopulation is around 0.2 to 0.45 (it varies a lot, even in a sample of 100000).
Of course outliers exist, they’re the exceptions that demonstrate the rule.
Besides, how do you even define “cumulative lifetime output”? Galton tried doing this at first, then realized it was impossible to make rigorous, which led to his proto-IQ tests in the first place.
I think if the real parameter is hard to measure or is maybe actually multiple parameters the correct answer is to think about modeling harder, not to insist that a dumb model is what we should use.
In less quantititative fields that need to be a little quantitative to publish they have a silly habit of slapping a linear regression model on their problem and calling it a day.
Papers, books, paintings, creating output? Do you think Van Gogh and his ilk would do well on an IQ test?
Cumulative lifetime output doesn’t seem very useful, though. For one thing, it’s only measurable for dead or near-dead people...
???
Cumulative just means “what you have done so far.”
You’re right, of course. Nevermind. Though the problem of measuring it for someone who hasn’t yet had the chance to do much remains.
Expected cumulative lifetime output, then.
Two papers per year * 30 years of productive career = 60 papers.… :-(
Most people, unlike you (according to your name, at least), are not paper machines.
Someone who works in a large department that values number of papers published and number of grants secured but doesn’t particularly care about quality of work, and so publishes four papers a year of poor quality, which are occasionally cited, but only by direct colleagues, vs. Douglas Hoftstadter, who rarely publishes anything but whose first work has been immensely influential, you’re going to get a worse picture than if you had just used IQ.
Heh, I suppose that is one of the alternative readings of my handle.
Only four? Why, I know some (who will remain nameless) that published eight or ten papers last year alone.
But of course Goodhart’s law ruins everything.
For which purpose?
For determining if someone is a giant, or a midget in giant’s clothing.
That’s not really useful. I don’t even know in which context are we talking about these things. Is it about hiring someone? Is it about deciding on whether someone’s work is “worthy” to be in a museum, or published, or something else? It is about admitting people to your social circle? Is it about generally ranking all people on some scale just because we can?
But that’s a test you can only run once somebody is dead, so it’s not very useful.
Anyone who is using IQ as a proxy for the expected value of allying with the measured person (e.g. befriending them; employing them) is making a vast mistake, yes. We should expect moral factors such as trustworthiness, and (for many sorts of alliance) emotional factors such as warmth, to have a pretty huge effect on the value of an alliance.
“Anne is smarter than Becky. Which one do you want as your best friend?”
″I don’t know, is Anne an asshole?”
This has been studied and the answer is that while conscientiousness/honesty is the second most useful trait to measure when hiring, it is less valuable than measuring IQ.
But how can someone be trustworthey without intelligence? Even if they want to do what’s best, they can’t be relied to do what’s best if they can’t figure out what the best thing to do is. Generally speaking, the more intelligent someone is, the more predictable they are (there are a few exceptions, such s where a mixed strategy is optimal). The fact is that idiots can often screw things up more than selfish people. With a selfish person, all you have to worry about is “Is it in their interests to do what I want?” And if you’re not worrying about that to begin with, then perhaps you are being selfish.
Intelligence is clearly correlated with expected value, and it’s definitely better than nothing at all. Furthermore, smart people are better than stupid people at convincing you that they’re smart. But honest people are often worse than dishonest people at convincing people that they’re honest.
A lot of this seems extremely contrary to my intuitions.
Poor performance (for instance on tests) isn’t the result of having a high rate of random errors, but of exhibiting repeatable bugs. This means that people with worse performance will be more predictable, not less — in order to predict the better performance, you have to actually look at the universe more, whereas to predict the worse performance you only have to look at the agent’s move history.
(For that matter, we can expect this from Bayes: if you’re learning poorly from your environment, you’re not updating, which means you’re generating behaviors based more on your priors alone.)
This seems to be a political tenet or tribal banner, not a self-evident fact.
(Worse, it borders on the “intelligence leads necessarily to goodness” meme, which is a serious threat to AI safety. A more intelligent agent is better equipped to achieve its goals, but is not necessarily better to have around to achieve your goals if those are not the same.)
By more predictable, I meant greater accuracy in predicting, not that less computing power is required to predict. Someone who performs well on tests is perfectly predictable: they always get the right answer. Someone with poor performance can’t be any more predictable than that, and is often less.
Just because the bug model has some value doesn’t mean that the error model has none. I would be surprised if a poorly performing student, given a test twice, were to give exactly the same wrong answers both times. I don’t understand you claim that people with worse performance would be more predictable. Given that someone is a good performer, all you need to do is solve the problem yourself, and assuming you did it correctly, you now know how that person would answer. To predict the worse performer, the move history is woefully inadequate. Poor performance is deterministic like a dice throw is deterministic. You need to know what their bugs are, what the exact conditions are, and how they’re approaching the problem. Someone who is using math will correctly evaluate 5(2+8) regardless of whether they find 2+8 first and then multiply by 5, or find 52 and 58 and add them together. But someone who doesn’t understand math will likely not only get the wrong answer, but get a different wrong answer depending on how they do the problem. Or just give up and give a random number. Just knowing how they did the problem before doesn’t tell you how they will do that exact problem in the future, and it certainly doesn’t allow you to extrapolate how they will do on other problems. If someone is doing math correctly, it doesn’t matter how they are implementing the math. But if they are doing it incorrectly, there are lots of different ways they can be doing it correctly, and given any particular problem, there are different wrong ways that get the same answer on that problem, but different answers on different problems. So just knowing what they got on one problem doesn’t distinguish between different wrong implementations.
Learning poorly from your environment does not mean not updating, it means that you are updating poorly. Given the problem “d = rt, d = 20, r = 5”, if you tell a poor learner that the correct procedure is to divide 20 by 5 and get t = 4, then given the problem “d = rt, r = 6, t = 2″, they will likely divide 6 by 2 and get d = 3. They have observed that “divide the first number by the second one” is the correct procedure in one case, and incorrectly updated the prior on “always divide the first number by the second one”. To know what rule they’ve “learned”, you have to know what cases they’ve previously seen.
Good learners don’t learn rules by Bayesian updating. They don’t learn “if you’re given d and r, you get t by dividing” by mindlessly observing instances and updating every time it gives the right answer. They learn it by understanding it. To know what rule a good learner has learned, you just need to know the correct rule; you don’t need to know what cases they’re seen.
That there are some cases where idiots can screw things up more than selfish people is rather self-evident. “Can” does not border on “necessarily will”. Intelligence doesn’t lead to goodness in the sense of more desire to do good, but it does generally lead to goodness in the sense of more good being done.
The whole point of an alliance is that you’re supposed to work together towards a common goal. If you’re trying to find stupid people so that you can have the upper hand in your dealings with them, that suggests that this isn’t really an “alliance”.
Let me throw in what might be a useful term: “unobservable”.
Take, for example, the standard deviation of a time series. We can certainly make estimates of it, but the actual volatility is unobservable directly, we can only see its effects. A large chunk of statistics is, in fact, dedicated to making estimates of unobservable quantities and figuring out whether these estimates are any good.
Another useful term is “well-defined”. For example, look at inflation. Inflation in general (defined as “change in prices”, more or less) is not well-defined and different people can (and do) propose various ways to quantify it. But if you take one specific measure, say in the US a particular CPI and define it as a number that comes out of specific procedure that the BLS performs every month, then it becomes well-defined.
Just to nitpick, the standard deviation of a time series is not even well-defined unless we know that the series is stationary. In Shalizi’s words, “if you want someone to solve the problem of induction, the philosophy department is down the stairs and to the left”. If it were well-defined (e.g. if the time series were coming from some physical process with rigidly specified parameters), it would be just as observable as the mass of the moon, i.e. indirectly. That would fit my criteria for a “true number”.
I guess that for me a “true number” has to be a well-defined number that you can measure in multiple ways and get the same result, so inflation is out because it’s not well-defined, and CPI is out because it’s just one method of measurement that doesn’t agree with anything else.
I suspect this distinction between “real” and “fake” numbers is blurrier than you are describing.
Consider voltage in classical physics. Differences in voltages are a real measurable quantity. But the “absolute” voltage at a given point is a mathematical fiction.
Or consider Kolmogorov complexity. It’s only defined once you fix a specific Turing machine (which researchers rarely bother to do.) And even then, it’s not decidable. Is that a real number or a fake number?
The distinction might be blurry, but I don’t think it’s blurrier for that particular reason :-)
Sure, to measure voltage or K-complexity you need to choose a scale. But the same is true for mass (kilograms or pounds, related by a scaling factor), temperature (Celsius or Fahrenheit, related by a translation and scaling), spacetime coordinates (dependent on position and velocity of origin), etc. You just choose a scale and then you’re done. With a fake number, on the other hand, you don’t know how to measure it even if you had a scale.
K-complexity isn’t really a matter of scale. Give me a program, and I can design a Turing machine that can implement it in one symbol.
For any two given Turing machines, you can find some constant so that the K-complexity of a program in terms of each Turing machine is within that constant, but it’s not like they’re off by that constant exactly. In fact, it’s impossible to do that.
Also, he gave two reasons. You only talked about the first.
Yeah, I agree that K-complexity is annoyingly relative. If there were something more absolute that could do the same job, I’d adopt it without a second thought, because it would be more “true” and less “fake” :-) And I feel the same way about Bayesian priors, for similar reasons.
I feel like there’s a meaningful distinction here, but calling them ‘true’ and fake’ smuggles in connotations that I don’t feel are accurate.
So. Entropy: real or fake number?
… even if you don’t know about quantum mechanics?
To make it worse, multiplicity!
Why would it be fake? If you ask several physicists to measure the specific entropy of some material, they will all get the same answer. If one gets a different answer, they will look for the error.
Why would entropy be a fake number? As far as I know, physicists happily measure the specific entropies of different materials. It doesn’t seem very different from temperature or energy.
You can measure free energies. You can measure entropy differences. Entropy itself is pretty slippery. If you don’t go down to a quantum level, there’s literally no way to say just how much entropy there is in a system, because you can’t determine how fine to slice it. (In quantum mechanics, the answer is ‘h-bar’)
Even if you’re aware of quantum mechanics, in many systems there is the question of whether to include degrees of freedom that could possibly become involved but won’t (with thermodynamically large probability) on the relevant timescales.
I instantly disliked your terminology of true vs fake. If I understand it correctly, you are making a distinction between widely agreed upon quantification procedures (like measuring weight) and those which are are either not well defined or contentious (cheeseburger surplus). You do not seem to stipulate that the output of these procedures be useful in any way, but maybe it is implied? Anyway, I would call these “metrics”, not “numbers”.
If I were to try to quantify the “trueness” of these “numbers”, I would look into repeatability of the quantification procedures and their robustness to small deviations (an equivalent of well-posedness in mathematics) and, separately, their acceptance level, i.e. the fraction of the experts in the area who use the same procedure. This gives you at least two separate dimensions, and I am sure there are others I overlooked. Additionally, only useful metrics are worth putting an effort in, though I am not sure how to quantify usefulness in a repeatable, acceptable and useful way (trying to be reflectively consistent here).
In summary, I would first identify which metrics would be useful, then figure out several robust ways to design them, then work on the acceptance level.
Measuring weight is not a single procedure, it’s many procedures that agree, because they’re measuring something that “exists” in some sense. So I’d go the other way round, and first try to figure out which quantities “exist”, regardless of usefulness. That’s how electricity was discovered, it was pretty useless at first.
Also this text by Lawrence Kesteloot might be relevant.
I suspect that your definition of “exists” is circular. How can we tell if something exists other than by having “procedures that agree”?
What was discovered is “many procedures that agree”, like rubbing some materials resulting in sparks, similar sparks between objects during a thunderstorm, etc. These were eventually abstracted into the concept of electricity, which ended up being very useful in its predictive power and practical applications.
...And you do that by finding the “procedures that agree”.
Oh, and I find the content of your link rather presumptuous due to its failure of imagination. I can easily imagine that “their” equivalent of math, physics or comp sci has nothing in common with ours.
Yeah, now I agree with your point.
I believe the best example of ‘fake numbers’ may be the measurement of IQ. The problem of this sort of fake numbers is that it is not certain to tell whether IQ really represents our true intellectual being but people still use it to be judgmental or even to justify their study not knowing when to stop to regard it as a simple reference.
Fake numbers seem to prevail in our professional life as companies do quantify people’s labor thanks to technology. They might be good estimates but that kind of numerical fixation affects people’s mind tremendously so that the moment the numbers are revealed it now controls the people. It won’t stay as a mere measurement reflecting the phenomena it gathered.
By the standard of reproducibility using different methods, IQ is assuredly real; there are many varieties of IQ test, and their results mostly agree.
If a proposed test didn’t agree with the existing ones, it wouldn’t be used as an IQ test.
I’m not certain how true this is. It’s not exactly the same thing, but Dalliard discusses something similar here (see section “Shalizi’s first error”). Specifically, a number of IQ tests have been designed with the intention that they would not produce a positive manifold (which would I think to at least some extent imply not agreeing with existing tests). Instead they end up producing a positive manifold and mostly agreeing with existing tests.
Again, this isn’t exactly the same thing, because it’s not like they were intended to produce a single number that disagreed with existing tests, so much as to go beyond the single-number-IQ model. Also, it’s possible that even though they were in some sense designed to agree with existing tests, they only get used because they instead agree (but for CAS this appears to be false (at least going by the article) and for some of the others it doesn’t apply). Still, it’s similar enough that I thought it was worth noting.
Coincidentally, I just came across this, probably already well-known and discussed on LW, which includes claims that rationality and intelligence “often have very little to do with each other”, and that “it is as malleable as intelligence and possibly much more so”. So there’s a clearly cognitive skill (rationality) claimed to be distinct from and not closely correlated with g.
Has evidence yet emerged to judge these claims? The article is less than a year old and is about the start of a 3-year project, so perhaps none yet.
If it correlated better with the vast battery of non-IQ-tests which nevertheless exhibit a positive manifold, it’d be used.
Hmm, it’s tricky. If new IQ tests are chosen based on correlations with many existing non-IQ tests, doesn’t that also make it more likely that different IQ tests will agree with each other? And do we see more agreement in practice than we’d expect apriori, given that selection process? (I have no idea and hadn’t thought of this question until now, thanks!)
Thinking about the geometry of that, it seems to me that any two things that correlate positively with all of a “vast battery of non-IQ-tests” are likely to correlate positively with each other, and the more diverse (i.e. less correlated with each other) that battery is, the tighter the constraint it places on the set of things that correlate with all of them.
Furthermore, if it correlates better with that battery than existing IQ tests do, it again seems, just from visualising the geometry, that it will correlate worse with the IQ tests than they do with each other.
But my intuition may be faulty. Do you have a concrete example of the phenomenon?
Indeed. To the extent that existing IQ tests are not terrible & crappy (and, y’know, they aren’t), any better IQ test is still going to correlate heavily with the old ones.
If a proposed scale didn’t agree with existing ones, it wouldn’t be used to measure mass.
Of course. Your point?
This means that your comment here isn’t actually evidence that IQ is not measurable.
It wasn’t intended to be. Were you pattern-matching against debates with other people?
(Late reply; knocked out by a cold all week.)
ETA: Having just read Sniffnoy’s comment and the useful article linked there, I have a better appreciation of the context around this issue. Some people have made the observation I made as such an argument.
I’m sure that in AI research many programs have been written around a specific well-defined utility function. Or, by “utility” you mean utility for a human? The “complexity of value” thesis is that the latter is very hard to define / measure. I’m not sure it makes it a “bad” concept.
The corresponding object in reinforcement learning systems is usually called a reward function, by analogy with behaviorist psychology; the fitness function of genetic programming is also related.
It is interesting that you choose mass as your prototypical “true” number. You say we can “measure” the mass of a person or car. This is true in the sense that we have a complex physical model of reality, and in one of the most superficial levels of this model (Newtonian mechanics) there exist some abstract numbers which characterise the motions of “objects” in response to “forces”. So “measuring” mass seems to only mean that we collect some data, fit this Newtonian model to that data, and extract relatively precise values for this parameter we call “mass”.
Most of your examples of “fake” numbers seem to be to be definable in exactly analogous terms. Your main gripe seems to be that different people try to use the same word to describe parameters in different models, or perhaps that there do not even exist mathematical models for some of them; do you agree? To use a fun phrase I saw recently, the problem is that we are wasting time with “linguistic warfare” when we should be busy building better models?
Yeah, that sounds right. You could say that a “true” number is a model parameter that fits the observed data well.
Perhaps, though, you could argue it differently. I have been trying to understand so-called “operational” subjective statical methods recently (as advocated by Frank Lad and his friends), and he is insisting on only calling a thing a [meaningful, I guess] “quantity” when there is some well-defined operational procedure for measuring what it is. Where for him “measuring” does not rely on a model, he is refering to reading numbers off some device or other, I think. I don’t quite understand him yet, since it seems to me that the numbers reported by devices all rely on some model or other to define them, but maybe one can argue their way out of this...
I don’t think that dividing reality into true numbers and fake numbers is very useful.
I think it’s more useful to ask yourself whether you can measure something for reasonable effort that’s a good predictor of some outcome you care about. You should also ask whether it continues to be a good predictor if you optimise towards it.
Discussing whether IQ is the real measure of intelligence is irrelevant. The important question is whether it predicts performance for tasks you care about.
Apart seeking for quantities that are good predictors it also makes sense to seek for quantities that are easy to measure. If you have a way to gather a lot of data cheaply and you can afterwards analyse what you can predict with it. That’s one of the reasons I always try to talk someone into doing the work of turning Anki review data in daily cognitive measurement scores. I don’t care whether the cognitive measurement from the Anki data is more of less real than IQ.
If I get it calculated for every day without having to spend additional time, that’s very valuable. IQ tests are relatively expensive because they take time and if you take the same test multiple time you train it in a way that make sit useless. We can also analyse whether it provides a good predictors for other outcomes we care about.
I think you’re coming from a different perspective. You care whether a quantity is easy to measure and whether it’s a good predictor of something that’s useful in practice. I don’t care very much about that, because in pre-paradigmatic fields it’s too early to ask for practical applications anyway. Instead I care whether the quantity can serve as a good building block for future research, and for that it needs to be a hard number rather than a soft one, so to speak. (Maybe I should’ve called them hard vs soft, instead of true vs fake.)
Whether something is easy to measure matters a lot whether it’s a good building block for future research. If something is easy to measure and it’s a good predictor of other qualities it provides a good building block for further research.
Easy to measure means that you can do research and study how the variable interacts with other variables. That’s the core of research.
That means you care whether the measurement has random and systemic noise but you don’t have to ask for realness.
Do you know more degrees of freedom of a system through having a measurement is a better question than asking whether the measurement is real.
If you focus on a variable that seems more real for you but for which it’s hard to gather data it can’t serve as a good building block for future research because acquiring the data is expensive which makes the research expensive.
If you want to further research you want variables that are cheap to measure with low noise and which add degrees of freedom that you don’t already have from other variables that you can easily access.
In theory you might have 10 easy to measure data points and then run principle component anaylsis and find that you have 5 “real variables”. It doesn’t make sense to focus at the start on the 5 real variables. It makes much more sense to focus on easy to measure variables that add information.
You’re mostly talking about research in soft sciences, right?
Academically my background is bioinformatics. Depending on your view that might or might not be a soft science. I also care a lot about QS and have thought a lot about measurement in that area.
I don’t have much knowledge of academic physics and don’t want to presume that I know what it takes to advance academic physics.
I don’t know much about bioinformatics, so maybe this is a chance for me to learn something. What does it take to advance bioinformatics? Can you describe some examples?
On example of bioinformatics are CpG-island. They are basically parts of DNA with a lot of C and G and those parts don’t contain genes.
At the beginning people tried to identify them with standards such as when X% of a Y base pair long strain are C and G and that strain is a CpG-island. People argued about what numbers for X and Y would provide for a more real way of identifying CpG-islands.
Over time people decided against that approach. It better to have an expert identify a bunch of CpG-islands by hand by whatever standards he likes and then training a hidden-markov model to identify CpG-islands based on the trainings data.
Part of the idea is that CpG-islands are not supposed to contain genes. Should a hidden-markov model identify some genes in CpG-islands one then tries to change the training data for the hidden-markov model.
Over time that gives you a concept of CpG-islands that’s useful because you put in training data to make it useful. The hidden markov model might still identify some strains of DNA as CpG-island that don’t have the characteristics we expected CpG-island to have, but no model is perfect.
As long as we can learn something useful from the model it doesn’t need to be perfect. There some distrust in bioinformatics against people who pretend that their model describes reality as is, because most models don’t work in every case.
That also something to keep in mind when looking at projects such as the Blue Brain project. The goal isn’t to model a full human brain as it really is but to test a simplified model of the human brain. When everything goes well that model is good enough to learn something interesting about the human brain.
To use the words of Alfred Korbyzski who wasn’t a bioinformatician, the map isn’t the territory. Good maps describes reality well enough that they are useful for navigating reality and making further discoveries.
It might be equivalent to physicists who don’t focus on whether or not the Many World hypothesis is real but who focus on the math and whether equations provide good predictions via “shut up and calculate”.
For shut up and calculate you need data. If you find a new way to efficiently gather reliable biological data then you can shut up and calculate instead of worrying whether your number are “real” or “hard” (whatever you mean with hard).
Another example: financial engineers treating correlation of housing prices as if they were they were concrete and fixed.
I think utility is a real number, it just happens to be a persons assigned value to a certain thing. If I assign 100 to something, then the number really is 100 as far as I am concerned—the fact that there is no instrument to measure this clouds the issue. It is similar to survey results—none of the numbers are absolutely measurable, as each is the persons answer to that question at that time.
So, can you give the explicit numbers you assign to specific things?
Yes I can—at any given time in my life I have weightings assigned to some choices / outcomes (or least a mental formula of it). These weightings may change based on my circumstance but the point I was trying to make is that my numbers are every bit as real as any survey result, sales special.
For what it’s worth, survey results are fake numbers, in the sense that I’m using the word.