Director at AI Impacts.
Richard Korzekwa
Like, keep your eye out. For sure, keep your eye out.
I think this is related to my relative optimism about people spending time on approaches to alignment that are clearly not adequate on their own. It’s not that I’m particularly bullish on the alignment schemes themselves, it’s that don’t think I’d realized until reading this post that I had been assuming we all understood that we don’t know wtf we’re doing so the most important thing is that we all keep an eye out for more promising threads (or ways to support the people following those threads, or places where everyone’s dropping the ball on being prepared for a miracle, or whatever). Is this… not what’s happening?
Product safety is a poor model for AI governance
75% of sufferers are affected day to day so its not just a cough for the majority its impacting peoples lives often very severely.
The UK source you link for this month says:
The proportion of people with self-reported long COVID who reported that it reduced their ability to carry out daily activities remained stable compared with previous months; symptoms adversely affected the day-to-day activities of 775,000 people (64% of those with self-reported long COVID), with 232,000 (19%) reporting that their ability to undertake their day-to-day activities had been “limited a lot”.
So, among people who self-report long covid, >80% say their day-to-day activities are not “limited a lot”. The dataset that comes with that page estimates the fraction of the UK population that would report such day-to-day-limiting long covid as 0.6%.
I agree that classic style as described by Thomas and Turner is a less moderate and more epistemically dubious way of writing, compared to what Pinker endorses. For example, from chapter 1 of Clear and Simple as the Truth:
Classic style is focused and assured. Its virtues are clarity and simplicity; in a sense so are its vices. It declines to acknowledge ambiguities, unessential qualifications, doubts, or other styles.
...
The style rests on the assumption that it is possible to think disinterestedly, to know the results of disinterested thought, and to present them without fundamental distortion....All these assumptions may be wrong, but they help to define a style whose usefulness is manifest.
I also agree that it is a bad idea to write in a maximally classic style in many contexts. But I think that many central examples of classic style writing are:
Not in compliance with the list of rules given in this post
Better writing than most of what is written on LW
It is easy to find samples of writing used to demonstrate characteristics of classic style in Pure and Simple as the Truth that use the first person, hedge, mention the document or the reader, or use the words listed in the “concepts about concepts” section. (To this post’s credit, it is easy to get the impression that classic style does outright exclude these things, because Thomas and Turner, using classic style, do not hedge their explicit statements about what is or is not classic style presumably because they expect the reader to see this clearly through examples and elaboration.)
Getting back to my initial comment, it is not clear to me what kind of writing this post is actually about. It’s hard to identify without examples, especially when the referenced books on style do not seem to agree with what the post is describing.
One of the reasons I want examples is because I think this post is not a great characterization of the kind of writing endorsed in Sense of Style. Based on this post, I would be somewhat surprised if the author had read the book in any detail, but maybe I misremember things or I am missing something.
[I typed all the quotes in manually while reading my ebook, so there are likely errors]
Self-aware style and signposting
Chapter 1 begins:
“Education is an admirable thing,” wrote Oscar Wilde, “but it is well to remember from time to time that nothing that is worth knowing can be taught.” In dark moments while writing this book, I sometimes feared that Wilde might be right.
This seems… pretty self-aware to me? He says outright that a writer should refer to themself sometimes:
Often the pronouns I, me, and you are not just harmless but downright helpful. They simulate a conversation, as classic style recommends, and they are gifts to the memory-challenged reader.
He doesn’t recommend against signposting, he just argues that inexperienced writers often overdo it:
Like all writing decisions, the amount of signposting requires judgement and compromise: too much, and the reader bogs down in reading the signposts; too little, and she has no idea where she is being led.
At the end of the first chapter, he writes:
In this chapter I have tried to call your attention to many of the writerly habits that result in soggy prose: metadiscourse, signposting, hedging, apologizing, professional narcissism, clichés, mixed metaphors, metaconcepts, zombie nouns, and unnecessary passives. Writers who want to invigorate their prose could could try to memorize that list of don’ts. But it’s better to keep in mind the guiding metaphor of classic style: a writer, in conversation with a reader, directs the reader’s gaze to something in the world. Each of the don’ts corresponds to a way in which a writer can stray from this scenario.
Hedging
Pinker does not recommend that writers “eliminate hedging”, but he does advise against “compulsive hedging” and contrasts this with what he calls “qualifying”:
Sometimes a writer has no choice but to hedge a statement. Better still, the writer can qualify the statement, that is, spell out the circumstances in which it does not hold, rather than leaving himself an escape hatch or being coy about whether he really means it.
Concepts about concepts
In the section that OP’s “don’t use concepts about concepts” section seems to be based on, Pinker contrasts paragraphs with and without the relevant words:
What are the prospects for reconciling a prejudice reduction model of change, designed to get people to like one another more, with a collective action model of change, designed to ignite struggles to achieve intergroup equality?
vs
Should we try to change society by reducing prejudice, that is, by getting people to like one another? Or should we encourage disadvantaged groups to struggle for equality through collective action? Or can we do both?
My reading of Pinker is not that he’s saying you can’t use those words or talk about the things they represent. He’s objecting to a style of writing that is clearly (to me) bad and misuse of those words is what makes it bad.
Talk about the subject, not about research about the subject
I don’t know where this one even came from, because Pinker does this all the time, including in The Sense of Style. When explaining the curse of knowledge in chapter 3, he describes lots of experiments:
When experimental volunteers are given a list of anagrams to unscramble, some of which are easier than others because the answers were shown to them beforehand, they rate the ones that were easier for them (because they’d seen the answers) to be magically easier for everyone.
Classic Style vs Self-Aware Style
Also a nitpick about terminology. OP writes:
Pinker contrasts “classic style” with what he calls “postmodern style” — where the author explicitly refers to the document itself, the readers, the authors, any uncertainties, controversies, errors, etc. I think a less pejorative name for “postmodern style” would be “self-aware style”.
Pinker contrasts classic style with three or four other styles, one of which is postmodern style, and the difference between classic style and postmodern style is not whether the writer explicitly refers to themself or the document:
[Classic style and two other styles] differ from self-conscious, relativistic, ironic, or postmodern styles, in which “the writer’s chief, if unstated, concern is to escape being convicted of philosophical naiveté about his own enterprise.” As Thomas and Turner note, “When we open a cookbook, we completely put aside—and expect the author to put aside—the kind of question that leads to the heart of philosophic and religious traditions. Is it possible to talk about cooking? Do eggs really exist? Is food something about which knowledge is possible? Can anyone else ever tell us anything true about cooking? … Classic style similarly puts aside as inappropriate philosophical questions about its enterprise. If it took those questions up, it could never get around to treating its subject, and its purpose is exclusively to treat its subject.
(Note the implication that if philosophy or writing or epistemology or whatever is the subject, then you may write about it without going against the guidelines of classic style)
I would find this more compelling if it included examples of classic style writing (especially Pinker’s writing) that fail at clear, accurate communication.
A common generator of doominess is a cluster of views that are something like “AGI is an attractor state that, following current lines of research, you will by default fall into with relatively little warning”. And this view generates doominess about timelines, takeoff speed, difficulty of solving alignment, consequences of failing to solve alignment on the first try, and difficulty of coordinating around AI risk. But I’m not sure how it generates or why it should strongly correlate with other doomy views, like:
Pessimism that warning shots will produce any positive change in behavior at all, separate from whether a response to a warning shot will be sufficient to change anything
Extreme confidence that someone, somewhere will dump lots of resources into building AGI, even in the face of serious effort to prevent this
The belief that narrow AI basically doesn’t matter at all, strategically
High confidence that the cost of compute will continue to drop on or near trend
People seem to hold these beliefs in a way that’s not explained by the first list of doomy beliefs, It’s not just that coordinating around reducing AI risk is hard because it’s a thing you make suddenly and by accident, it’s because the relevant people and institutions are incapable of such coordination. It’s not just that narrow AI won’t have time to do anything important because of short timelines, it’s that the world works in a way that makes it nearly impossible to steer in any substantial way unless you are a superintelligence.
A view like “aligning things is difficult, including AI, institutions, and civilizations” can at least partially generate this second list of views, but overall the case for strong correlations seems iffy to me. (To be clear, I put substantial credence in the attractor state thing being true and I accept at least a weak version of “aligning things is hard”.)
Montgolfier’s balloon was inefficient, cheap, slapped together in a matter of months
I agree the balloons were cheap in the sense that they were made by a couple hobbyists. It’s not obvious to me how many people at the time had the resources to make one, though.As for why nobody did it earlier, I suspect that textile prices were a big part of it. Without doing a very deep search, I did find a not-obviously-unreliable page with prices of things in Medieval Europe, and it looks like enough silk to make a balloon would have been very expensive. A sphere with a volume of 1060 m^3 the volume of their first manned flight) has a surface area of ~600 yard^2. That page says a yard of silk in the 15th century was 10-12 shillings, so 600 yards would be ~6000s or 300 pounds. That same site lists “Cost of feeding a knight’s or merchants household per year” as “£30-£60, up to £100″, so the silk would cost as much as feeding a household for 3-10 years.
This is, of course, very quick-and-dirty and maybe the silk on that list is very different from the silk used to make balloons (e.g. because it’s used for fancy clothes). And that’s just the price at one place and time. But given my loose understanding of the status of silk and the lengths people went to to produce and transport it, I would not find it surprising if a balloon’s worth of silk was prohibitively expensive until not long before the Montgolfiers came along.
I also wonder if there’s a scaling thing going on. The materials that make sense for smaller, proof-of-concept experiments is not the same as what makes sense for a balloon capable of lifting humans. So maybe people had been building smaller stuff with expensive/fragile things like silk and paper for a while, without realizing they could use heavier materials for a larger balloon.
it’s still not the case that we can train a straightforward neural net on winning and losing chess moves and have it generate winning moves. For AlphaGo, the Monte Carlo Tree Search was a major component of its architecture, and then any of the followup-systems was trained by pure self-play.
AlphaGo without the MCTS was still pretty strong:
We also assessed variants of AlphaGo that evaluated positions using just the value network (λ = 0) or just rollouts (λ = 1) (see Fig. 4b). Even without rollouts AlphaGo exceeded the performance of all other Go programs, demonstrating that value networks provide a viable alternative to Monte Carlo evaluation in Go.
Even with just the SL-trained value network, it could play at a solid amateur level:
We evaluated the performance of the RL policy network in game play, sampling each move...from its output probability distribution over actions. When played head-to-head, the RL policy network won more than 80% of games against the SL policy network. We also tested against the strongest open-source Go program, Pachi14, a sophisticated Monte Carlo search program, ranked at 2 amateur dan on KGS, that executes 100,000 simulations per move. Using no search at all, the RL policy network won 85% of games against Pachi.
I may be misunderstanding this, but it sounds like the network that did nothing but get good at guessing the next move in professional games was able to play at roughly the same level as Pachi, which, according to DeepMind, had a rank of 2d.
Here’s a selection of notes I wrote while reading this (in some cases substantially expanded with explanation).
The reason any kind of ‘goal-directedness’ is incentivised in AI systems is that then the system can be given an objective by someone hoping to use their cognitive labor, and the system will make that objective happen. Whereas a similar non-agentic AI system might still do almost the same cognitive labor, but require an agent (such as a person) to look at the objective and decide what should be done to achieve it, then ask the system for that. Goal-directedness means automating this high-level strategizing.
This doesn’t seem quite right to me, at least not as I understand the claim. A system that can search through a larger space of actions will be more capable than one that is restricted to a smaller space, but it will require more goal-like training and instructions. Narrower instructions will restrict its search and, in expectation, result in worse performance. For example, if a child wanted cake, they might try to dictate actions to me that would lead to me baking a cake for them. But if they gave me the goal of giving them a cake, I’d find a good recipe or figure out where I can buy a cake for them and the result would be much better. Automating high-level strategizing doesn’t just relieve you of the burden of doing it yourself, it allows an agent to find superior strategies to those you could come up with.
Skipping the nose is the kind of mistake you make if you are a child drawing a face from memory. Skipping ‘boredom’ is the kind of mistake you make if you are a person trying to write down human values from memory. My guess is that this seemed closer to the plan in 2009 when that post was written, and that people cached the takeaway and haven’t updated it for deep learning which can learn what faces look like better than you can.
(I haven’t waded through the entire thread on the faces thing, so maybe this was mentioned already.) It seems to me that it’s a lot easier to point to examples of faces that an AI can learn from than examples of human values that an AI can learn from.
It also seems plausible that [the AIs under discussion] would be owned and run by humans. This would seem to not involve any transfer of power to that AI system, except insofar as its intellectual outputs benefit it
I think this is a good point, but isn’t this what the principal-agent problem is all about? And isn’t that a real problem in the real world?
That is, tasks might lack headroom not because they are simple, but because they are complex. E.g. AI probably can’t predict the weather much further out than humans.
They might be able to if they can control the weather!
IQ 130 humans apparently earn very roughly $6000-$18,500 per year more than average IQ humans.
I left a note to myself to compare this to disposable income. The US median household disposable income (according to the OECD, includes transfers, taxes, payments for health insurance, etc) is about $45k/year. At the time, my thought was “okay, but that’s maybe pretty substantial, compared to the typical amount of money a person can realistically use to shape the world to their liking”. I’m not sure this is very informative, though.
Often at least, the difference in performance between mediocre human performance and top level human performance is large, relative to the space below, iirc.
I take machine chess performance as evidence for a not-so-small range of human ability, especially when compared to rate of increase of machine ability. But I think it’s good to be cautious about using chess Elo as a measure of the human range of ability, in any absolute sense, because chess is popular in part because it is so good at separating humans by skill. It could be the case that humans occupy a fairly small slice of chess ability (measured by, I dunno, likelihood of choosing the optimal move or some other measure of performance that isn’t based on success rate against other players), but a small increase in skill confers a large increase in likelihood of winning, at skill levels achievable by humans.
~Goal-directed entities may tend to arise from machine learning training processes not intending to create them (at least via the methods that are likely to be used).~
I made my notes on the AI Impacts version, which was somewhat different, but it’s not clear to me that this should be crossed out. It seems to me that institutions do exhibit goal-like behavior that is not intended by the people who created them.
“Paxlovid’s usefulness is questionable and could lead to resistance. I would follow the meds and supplements suggested by FLCC”
Their guide says:
In a follow up post-marketing study, Paxlovid proved to be ineffective in patients less than 65 years of age and in those who were vaccinated.
This is wrong. The study reports the following:
Among the 66,394 eligible patients 40 to 64 years of age, 1,435 were treated with nirmatrelvir. Hospitalizations due to Covid-19 occurred in 9 treated and 334 untreated patients: adjusted HR 0.78 (95% CI, 0.40 to 1.53). Death due to Covid-19 occurred in 1 treated and 13 untreated patients; adjusted HR: 1.64 (95% CI, 0.40 to 12.95).
As the abstract says, the study did not have the statistical power to show a benefit for preventing severe outcomes in younger adults. It did not “prove [Paxlovid] to be ineffective”! This is very bad, the guide is clearly not a reliable source of information about covid treatments, and I recommend against following the advice of anything else on that website.
I was going to complain that the language quoted from the abstract in the frog paper is sufficiently couched that it’s not clear the researchers thought they were measuring anything at all. Saying that X “suggests” Y “may be explained, at least partially” by Z seems reasonable to me (as you said, they had at least not ruled out that Z causes Y). Then I clicked through the link and saw the title of the paper making the unambiguous assertion that Z influences Y.
When thinking about a physics problem or physical process or device, I track which constraints are most important at each step. This includes generic constraints taught in physics classes like conservation laws, as well as things like “the heat has to go somewhere” or “the thing isn’t falling over, so the net torque on it must be small”.
Another thing I track is what everything means in real, physical terms. If there’s a magnetic field, that usually means there’s an electric current or permanent magnet somewhere. If there’s a huge magnetic field, that usually means a superconductor or a pulsed current. If there’s a tiny magnetic field, that means you need to worry about the various sources of external fields. Even in toy problems that are more like thought experiments than descriptions of the real world, this is useful for calibrating how surprised you should be by a weird result (e.g. “huh, what’s stopping me from doing this in my garage and getting a Nobel prize?” vs “yep, you can do wacky things if you can fill a cubic km with a 1000T field!”).
Related to both of these, I track which constraints and which physical things I have a good feel for and which I do not. If someone tells me their light bulb takes 10W of electrical power and creates 20W of visible light, I’m comfortable saying they’ve made a mistake*. On the other hand, if someone tells me about a device that works by detecting a magnetic field on the scale of a milligauss, I mentally flag this as “sounds hard” and “not sure how to do that or what kind of accuracy is feasible”.
*Something else I’m noticing as I’m writing this: I would probably mentally flag this as “I’m probably misunderstanding something, or maybe they mean peak power of 20W or something like that”
Communication as a constraint (along with transportation as a constraint), strikes me as important, but it seems like this pushes the question to “Why didn’t anyone figure out how to control something that’s more than a couple weeks away by courier?”
I suspect that, as Gwern suggests, making copies of oneself is sufficient to solve this, at least for a major outlier like Napoleon. So maybe another version of the answer is something like “Nobody solved the principle-agent problem well enough to get by on communication slower than a couple weeks”. But it still isn’t clear to me why that’s the characteristic time scale? (I don’t actually know what the time scale is, by the way, I just did five minutes of Googling to find estimates for courier time across the Mongol and Roman Empires)
in a slow takeoff world, many aspects of the AI alignment problems will already have showed up as alignment problems in non-AGI, non-x-risk-causing systems; in that world, there will be lots of industrial work on various aspects of the alignment problem, and so EAs now should think of themselves as trying to look ahead and figure out which margins of the alignment problem aren’t going to be taken care of by default, and try to figure out how to help out there.
I agree with this, and I think it extends beyond what you’re describing here. In a slow takeoff world, the aspects of the alignment problem that show up in non-AGI systems will also provide EAs with a lot of information about what’s going on, and I think we should try to do things now that will help us to notice those aspects and act appropriately. (I’m not sure what this looks like; maybe we want to build relationships with whoever will be building these systems, or maybe we want to develop methods for figuring things out and fixing problems that are likely to generalize.)
I agree. We’re working on making it more browsable and making the hierarchical structure more apparent.
It is! You can see the announcement here.
Interesting, thanks for sharing.
I don’t think I’d take the paper itself seriously. Melatonin Research doesn’t seem to be a real journal and the paper looks very unprofessional. One of the paper’s authors is a lighting engineer who I think is trying to get people to use indoor lighting with more NIR, while the other seems to be a late-career melatonin guy who probably did some good research at some point. This isn’t to say that we should dismiss things unless they’re written by Credentialed Scientists® in a Serious Journal® and formatted in The Right Way®. But peer review does provide some protection from nonsense and we should be cautious about stuff that’s trying to look like it’s part of the academic peer-review system when it’s not.
As for the content, I’m not able to say much about the biology. I didn’t notice anything while skimming it that I know to be wrong, but I know very little biology, so that’s not surprising.
The optics is suspect. I’m pretty dubious about the thing with a “light guide” and the idea that the structure of a persons head/brain is supposed to distribute NIR in some way. I almost dismissed the stuff about NIR and the brain out of hand, but it turns out a human skull/scalp can diffusely transmit something like a few percent 830nm light, at least according to this paper with a simple but reasonably compelling experiment.
I’ll admit that paper does leave me wondering if longer wavelengths do matter somehow. It looks like a person standing in sunlight or next to a campfire probably does get some non-negligible NIR illumination through a substantial portion of their body. But I wouldn’t take that paper as strong evidence for or against much of anything.
I had sort of vaguely assume you were already doing something like this. It is pretty close to what I used to do for assigning grades while avoiding a “barely missed out” dynamic, in which someone would miss the cutoff for an A by 0.25%.
FWIW, I think questions like “what actually causes globally consequential things to happen or not happen” are one of the areas in which we’re most dropping the ball. (AI Impacts has been working on a few related question, more like “why do people sometimes not do the consequential thing?”)
I think it’s good to at least spot check and see if there are interesting patterns. If “why is nobody doing X???” is strongly associated with large effects, this seems worth knowing, even if it doesn’t constitute a measure of expected effect sizes.