dynomight
If I hadn’t heard back from them, would you want me to tell you? Or would that be too sad?
Seed oils are usually solvent extracted, which makes me wonder, how thoroughly are they scrubbed of solvent, what stuff in the solvent is absorbed into the oil (also an effective solvent for various things), etc
I looked into this briefly at least for canola oil. There, the typical solvent is hexane. And some hexane does indeed appear to make it into the canola oil that we eat. But hexane apparently has very low toxicity, and—more importantly—the hexane that we get from all food sources apparently makes up less than 2% of our total hexane intake! https://www.hsph.harvard.edu/nutritionsource/2015/04/13/ask-the-expert-concerns-about-canola-oil/ Mostly we get hexane from gasoline fumes, so if hexane is a problem, it’s very hard to see how to pin the blame on canola oil.
It’s a regression. Just like they extrapolate backwards to (1882+50=1932) using data from 1959, they extrapolate forwards at the end. (This is discussed in the “timelines” section.) This is definitely a valid reason to treat it with suspicion, but nothing’s “wrong” exactly.
Many thanks! All fixed (except one that I prefer the old way.)
As the original author of underrated reasons to be thankful (here), I guess I can confirm that tearing apart the sun for raw materials was not an intended implication.
I think matplotlib has way too many ways to do everything to be comprehensive! But I think you could do almost everything with some variants of these.
ax.spines['top'].set_visible(False) # or 'left' / 'right' / 'bottom' ax.set_xticks([0,50,100],['0%','50%','100%']) ax.tick_params(axis='x', left=False, right=False) # or 'y' ax.set_ylim([0,0.30]) ax.set_ylim([0,ax.get_ylim()[1]])
Good point regarding year tick marks! I was thinking think that labeling 0°C would make the most sense when freezing is really important. Say, if you were plotting historical data on temperatures and you were interested in trying to estimate the last frost date in spring or something. Then, 10°C would mean “twice as much margin” as 5°C.
One way you could measure which one is “best” would be to measure how long it takes people to answer certain questions. E.g. “For what fraction of the 1997-2010 period did Japan spend more on healthcare per-capita than the UK?” or “what’s the average ratio of healthcare spending in Sweden vs. Greece between 2000 and 2010?” (I think there is an academic literature on these kinds of experiments, though I don’t have any references on hand.)
In this case, I think Tufte goes overboard in saying you shouldn’t use color. But if the second plot had color, I’d venture it would win most such contests, if only because the y-axis is bigger and it’s easier to match the lines with the labels. But even if I don’t agree with everything Tufte says, I still find him useful because he suggests different options and different ways to think about things.
Thanks, someone once gave me the advice that after you write something, you should go back to the beginning and delete as many paragraphs as you can without making everything incomprehensible. After hearing this, I noticed that most people tend to write like this:
Intro
Context
Overview
Other various throat clearing
Blah blah blah
Finally an actual example, an example, praise god
Which is pretty easy to correct once you see it!
Hey, you might be right! I’ll take this as useful feedback that the argument wasn’t fully convincing. Don’t mean to pull a motte-and-bailey, but I suppose if I had to, I’d retreat to an argument like, “if making a plot, consider using these rules as one option for how to pick axes.” In any case, if you have any examples where you think following this advice leads to bad choices, I’d be interested to hear them.
I think you’re basically right: Correlation is just one way of measuring dependence between variables. Being correlated is a sufficient but not necessary condition for dependence. We talk about correlation so much because:
We don’t have a particularly convenient general scalar measure of how related two variables are. You might think about using something like mutual information, but for that you need the densities not datasets.
We’re still living in the shadows of the times when computers weren’t so big. We got used to doing all sorts of stuff based on linearity decades ago because we didn’t have any other options, and they became “conventional” even when we might have better options now.
Thanks, you’ve 100% convinced me. (Convincing someone that something that (a) is known to be true and (b) they think isn’t surprising, actually is surprising is a rare feat, well done!)
Chat or instruction finetuned models have poor prediction cailbration, whereas base models (in some cases) have perfect calibration.
Tell me if I understand the idea correctly: Log-loss to predict next token leads to good calibration for single token prediction, which manifests as good calibration percentage predictions? But then RLHF is some crazy loss totally removed from calibration that destroys all that?
If I get that right, it seems quite intuitive. Do you have any citations, though?
Sadly, no—we had no way to verify that.
I guess one way you might try to confirm/refute the idea of data leakage would be to look at the decomposition of brier scores: GPT-4 is much better calibrated for politics vs. science but only very slightly better at politics vs. science in terms of refinement/resolution. Intuitively, I’d expect data leakage to manifest as better refinement/resolution rather than better calibration.
That would definitely be better, although it would mean reading/scoring 1056 different responses, unless I can automate the scoring process. (Would LLMs object to doing that?)
Thank you, I will fix this! (Our Russian speaker agrees and claims they noticed this but figured it didn’t matter 🤔) I re-ran the experiments with the result that GPT-4 shifted from a score of +2 to a score of −1.
Well, no. But I guess I found these things notable:
Alignment remains surprisingly brittle and random. Weird little tricks remain useful.
The tricks that work for some models often seem to confuse others.
Cobbling together weird little tricks seems to help (Hindi ranger step-by-step)
At the same time, the best “trick” is a somewhat plausible story (duck-store).
PaLM 2 is the most fun, Pi is the least fun.
You’ve convinced me! I don’t want to defend the claim you quoted, so I’ll modify “arguably” into something much weaker.
I don’t think I have any argument that it’s unlikely aliens are screwing with us—I just feel it is, personally.
I definitely don’t assume our sensors are good enough to detect aliens. I’m specifically arguing we aren’t detecting alien aircraft, not that alien aircraft aren’t here. That sound like a silly distinction, but I’d genuinely give much higher probability to “there are totally undetected alien aircraft on earth” than “we are detecting glimpses of alien aircraft on earth.”
Regarding your last point, I totally agree those things wouldn’t explain the weird claims we get from intelligence-connected people. (Except indirectly—e.g. rumors spread more easily when people think something is possible for other reasons.) I think that our full set of observations are hard to explain without aliens! That is, I think P[everything | aliens] is low. I just think P[everything | no aliens] is even lower.
Do you happen to have any recommended pointers for research on health impacts of processed food? It’s pretty easy to turn up a few recent meta reviews, which seems like a decent place to start, but I’d be interested if there were any other sources, particularly influential individual experiments, etc. (It seems like there’s a whole lot of observational studies, but many fewer RCTs, for reasons that I guess are pretty understandable.) It seems like some important work here might never use the word “processing”.