in case i forgot last month, here’s a link to july
A wager you say
One proof of concept for the GSAI stack would be a well-understood mechanical engineering domain automated to the next level and certified to boot. How about locks? Needs a model of basic physics, terms in some logic for all the parts and how they compose, and some test harnesses that simulate an adversary. Can you design and manufacture a provably unpickable lock?
Zac says you can’t get a provably unpickable lock on this timeline. Zac gave (up to) 10:1 odds, so recall that the bet can be a positive expected value for Ben even if he thinks the event is most likely not going to happen.
For funsies, let’s map out one path of what has to happen for Zac to pay Ben $10k. This is not the canonical path, but it is a path:
Physics to the relevant granularity (question: can human lockpicks leverage sub-newtownian issues?) is conceptually placed into type theory or some calculus. I tried a riemann integral in coq once (way once), so it occurs to me that you need to decide if you want just the functional models (perhaps without computation / with proof irrelevance) in your proof stack or if you want the actual numerical analysis support in there as well.
Good tooling, library support, etc. around that conceptual work (call it mechlib) to provide mechanical engineering primitives
A lock designing toolkit, depending on mechlib, is developed
Someone (e.g. a large language model) is really good at programming in the lock designing toolkit. They come up with a spec L.
You state the problem “forall t : trajectories through our physics simulation, if L(t) == open(L) then t == key(L)”
Then you get to write a nasty gazillion line Lean proof
Manufacture a lock (did I mention that the design toolkit has links to actual manufacturing stacks?)
Everyone fails. Except Ben and the army of postdocs that $9,999 can buy.
Looks like after the magnificent research engineering in steps 1 and 2, the rest is just showing off and justifying those two steps. Of course, in a world where we have steps 1 and 2 we have a great deal of transformative applications of formal modeling and verification just in reach, and we’ll need a PoC like locks to practice and concretize the workflow.
Cryptography applications tend to have a curse of requiring a lot of work after the security context, permission set, and other requirements are frozen in stone, which means that when the requirements change you have to start over and throw out a bunch of work (epistemic status: why do you think so many defi projects have more whitepapers than users?). The provably unpickable lock has 2 to 10 x that problem– get the granularity wrong in step one, most of your mechlib implementation won’t be salvageable. As the language model iterates on the spec L in step 5, the other language model has to iterate on the proof in step 6, because the new spec will break most of the proof.
Sorry I don’t know any mechanical engineering, Ben, otherwise I’d take some cracks at it. The idea of a logic such that its denotation is a bunch of mechanical engineering primitives seems interesting enough that my “if it was easy to do in less than a year someone would’ve, therefore there must be a moat” heuristic is tingling. Perhaps oddly, the quantum semantics folks (or with HoTT!) seem to have been productive, but I don’t know how much of that is translatable to mechanical engineering.
Reinforcement learning from proof assistant feedback, and yet more monte carlo tree search
The steps are pretraining, supervised finetuning, RLPAF (reinforcement learning from proof assistant feedback), and MCTS (monte carlo tree search). RLPAF is not very rich: it’s a zero reward for any bug at all and a one for a happy typechecker. Glad they got that far with just that.
Harmonic ships their migration of miniF2F to Lean 4, gets 90% on it, is hiring
From their “one month in” newsletter. “Aristotle”, which has a mysterious methodology since I’ve only seen their marketing copy rather than an arxiv paper, gets 90% on miniF2F 4 when prompted with natural language proofs. It doesn’t look to me like the deepseek or LEGO papers do that? I could be wrong. It’s impressive just to autoformalize natural language proofs, I guess I’m still wrapping my head around how much harder it is (for an LLM) to implement coming up with the proof as well.
Atlas ships their big google doc alluded to in the last newsletter
Worth a read! The GSAI stack is large and varied, and this maps out the different sub-sub-disciplines. From the executive summary:
You could start whole organizations for every row in this table, and I wouldn’t be a part of any org that targets more than a few at once for fear of being unfocused. See the doc for more navigation (see what I did there? Navigating like with an atlas, perhaps? Get it?) of the field’s opportunities.[1]
Efficient shield synthesis via state-space transformation
Shielding is an area of reactive systems and reinforcement learning that marks states as unsafe and synthesizes a kind of guarding layer between the agent and the environment that prevents unsafe actions from being executed in the environment. So in the rejection sampling flavored version, it literally intercepts the unsafe action and tells the agent “we’re not running that, try another action”. One of the limitations in this literature is computational cost, shields are, like environments, state machines plus some frills, and there may simply be too many states. This is the limitation that this paper focuses on.
Note that verified software systems is an area which is highly suitable for a simplified gatekeeper workflow, in which the world-model is implicit in the specification logic. However, in the context of ARIA’s mission to “change the perception of what’s possible or valuable,” we consider that this application pathway is already perceived to be possible and valuable by the AI community. As such, this programme focuses on building capabilities to construct guaranteed-safe AI systems in cyber-physical domains. That being said, if you are an organisation which specialises in verified software, we would love to hear from you outside of this solicitation about the cyber-physical challenges that are just at the edge of the possible for your current techniques.
This is really cool stuff, I hope they find brave and adventurous teams. I had thought gatekeeper prototypes would be in minecraft or mujoco (and asked a funder if they’d support me in doing that), so it’s wild to see them going for actual cyberphysical systems so quickly.
See Limitations on Formal Verification for AI Safety over on LessWrong. I have a lot of agreements, and my disagreements are more a matter of what deserves emphasis than the fundamentals. Overall, I think the Tegmark/Omohundro paper failed to convey a swisscheesey worldview, and sounded too much like “why not just capture alignment properties in ‘specs’ and prove the software ‘correct’?” (i.e. the vibe I was responding to in my very pithy post). However, I think my main reason I’m not using Dickson’s post as a reason to just pivot all my worldview and resulting research is captured in one of Steve’s comments:
I’m focused on making sure our infrastructure is safe against AI attacks.
Like, a very strong version I almost endorse is “GSAI isn’t about AI at all, it’s about systems coded by extremely powerful developers (which happen to be AIs)”, and ensuring safety, security, and reliability capabilities scale at similar speeds with other kinds of capabilities.
It looks like one can satisfy Dickson just by assuring him that GSAI is a part of a swiss cheese stack, and that no one is messianically promoting One Weird Trick To Solve Alignment. Of course, I do hope that no one is messianically promoting One Weird Trick…
One problem off the top of my head regarding the InterFramework section: Coq and Lean seems the most conceptually straightforward since they have the same underlying calculus, but even there just a little impredicativity or coinduction could lead to extreme headaches. Now you can have a model at some point in the future that steamrolls over these headaches, but then you have a social problem of the broader Lean community not wanting to upstream those changes– various forks diverging fundamentally seems problematic to me, would lead to a lot of duplicated work and missed opportunities for collaboration. I plan to prompt Opus 3.5 with “replicate flocq in lean4” as soon as I get access to the model, but how much more prompting effort will it be to ensure compliance with preexisting abstractions and design patterns, so that it can not only serve my purposes but be accepted by the community? At least there’s no coinduction in flocq, though some of the proofs may rely on set impredicativity for all I know (I haven’t looked at it in a while).
This can be modeled as a conversation with readers, where the reader prompts the writer to taking the next step on the list.
Claim ought to be supported with reasons. Reasons ought to be based on evidence. Arguments are recursive: a part of an argument is an acknowledgment of an anticipated response, and another argument addresses that response. Finally, when the distance between a claim and a reason grows large, we draw connections with something called warrants.
The logic of warrants proceeds in generalities and instances. A general circumstance predictably leads to a general consequence, and if you have an instance of the circumstance you can infer an instance of the consequence.
Arguing in real life papers is complexified from the 5 steps, because
Claims should be supported by two or more reasons
A writer can anticipate and address numerous responses.
As I mentioned, arguments are recursive, especially in the anticipated response stage, but also each reason and warrant can necessitate a subargument.
You might embrace a claim too early, perhaps even before you have done much research, because you “know” you can prove it. But falling back on that kind of certainty will just keep you from doing your best thinking.
Thinking about a top-level post on FOMO and research taste
Fear of missing out defined as inability to execute on a project cuz there’s a cooler project if you pivot
but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute
was discussing this with a friend: “yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair”
I’ve developed strategies to beat this FOMO and gain more depth and detail with projects (too recent to see returns yet, but getting there) but I also suspect it was nutritious of me to develop discernment about what projects are valuable or not valuable for various threat models and theories of change (in such a way that being a phd student off of lesswrong wouldn’t have been as good in crucial ways, tho way better in other ways).
but I think the point is you have to turn off this discernment sometimes, unless you want to specialize in telling people why their plans won’t work, which I’m more dubious on the value of than I used to be
Idk maybe this shortform is most of the value of the top level post
I get to have all these talkative blowhard traits and no one will punish me for it cuz I’m a girl. This is one major reason detrans would make my life worse. Society is so cruel to men, it sucks so much for them
And another trans woman had told me almost the exact same thing a couple months ago.
My take is that roles have upsides and downsides, and that you’ll do a bad job if you try to say one role is better or worse than another on net or say that a role is more downside than upside. Also, there are versions of “women talk too much” as a stereotype in many subcultures, but I don’t have a good inside view about it.
This may be true, but it might be that she’s incurring a bunch of social penalities she isn’t aware of. Women are less likely to overtly punish, so if she’s spending more time with women that could already explain it. No one yells at you to STFU, but you miss out on party invite you would have gotten if you shared the conversation better.
I suspect men are also more willing to tell other men to STFU than they are to say it to women, but will let someone else speak to that question.
The fact that both roles have advantages and disadvantage doesn’t necessarily prove that neither is better on net. Then again, “better” by what preferences? Lucky are the people whose preferences match the role they were assigned.
To me it seems that women have a greater freedom of self-expression, as long as they are not competitive. Men are treated instrumentally: they are socially allowed to work and to compete against each other, anything else is a waste of energy. For example, it is okay for a man to talk a lot, if he is a politician, manager, salesman, professor, priest… simply, if it is a part of his job. And when he is seducing a woman. Otherwise, he should be silent. Women are expected to chit-chat all the time, but they should never contradict men, or say anything controversial.
one may be net better than the other, I just think the expected error washes out all of one’s reasoning so individuals shouldn’t be confident they’re right.
Getting the easy things right shows respect for your readers and is the best training for dealing with the hard things.
If they don’t believe the evidence, they’ll reject the reasons and, with them, your claim.
We saw previously that claims ought to be supported with reasons, and reasons ought to be based on evidence. Now we will look closer at reasons and evidence.
Reasons must be in a clear, logical order. Atomically, readers need to buy each of your reasons, but compositionally they need to buy your logic. Storyboarding is a useful technique for arranging reasons into a logical order: physical arrangements of index cards, or some DAG-like syntax. Here, you can list evidence you have for each reason or, if you’re speculating, list the kind of evidence you would need.
When storyboarding, you want to read out the top level reasons as a composite entity without looking at the details (evidence), because you want to make sure the high-level logic makes sense.
Readers will not accept a reason until they see it anchored in what they consider to be a bedrock of established fact. … To count as evidence, a statement must report something that readers agree not to question, at least for the purposes of the argument. But if they do question it, what you think is hard factual evidence is for them only a reason, and you have not yet reached that bedrock of evidence on which your argument must rest.
I think there is a contract between you and the reader. You must agree to cite sources that are plausibly truthful, and your reader must agree to accept that these sources are reliable. A diligent and well-meaning reader can always second-guess whether, for instance, the beureau of subject matter statistics is collecting and reporting data correctly, but at a certain point this violates the social contract. If they’re genuinely curious or concerned, it may fall on them to investigate the source, not on you. The bar you need to meet is that your sources are plausibly trustworthy. The book doesn’t talk much about this contract, so there’s little I can say about what “plausible” means.
Sometimes you have to be extra careful to distinguish reasons from evidence, a (<claim>, <reason>, <evidence>) tuple is subject to regress in the latter two components, (A, B, C) may need to be justified by (B, C, D) and so on. The example given of this regress is if I told you (american higher education must curb escalating tuition costs, because the price of college is becoming an impediment to the american dream, today a majority of students leave college with a crushing debt burden). In the context of this sentence, “a majority of students...” is evidence, but it would be reasonable to ask for more specifics. In principle, any time information is compressed it may be reasonable to ask for more specifics. A new tuple might look like (the price of college is becoming an impediment to the american dream, because today a majority of students leave college with a crushing debt burden, in 2013 nearly 70% of students borrowed money for college with loans averaging $30000...). The third component is still compressing information, but it’s not in the contract between you and the reader for the reader to demand the raw spreadsheet, so this second tuple might be a reasonable stopping point of the regress.
If you can imagine readers plausibly asking, not once but many times, how do you know that? What facts make it true?, you have not yet reached what readers want—a bedrock of uncontested evidence.
Sometimes you have to be careful to distinguish evidence from reports of it. Again, because we are necessarily dealing with compressed information, we can’t often point directly to evidence. Even a spreadsheet, rather than summary statistics of it, is a compression of the phenomena in base reality that it tracks.
data you take from a source have invariably been shaped by that source, not to misrepresent them, but to put them in a form that serves that source’s ends. … when you in turn report those data as your own evidence, you cannot avoid manipulating them once again, at least by putting them in a new context.
There is a criteria you want to screen your evidence with respect to.
sufficient
representative
accurate
precise
authoritative
Being honest about the reliability and prospective accuracy of evidence is always a positive signal. Evidence can be either too precise or not precise enough. The women in one or two of Shakespeare’s plays do not represent all his women, they are not representative. Figure out what sorts of authority signals are considered credible in your community, and seek to emulate them.
Primary sources provide you with the “raw data” or evidence you will use to develop, test, and ultimately justify your hypothesis or claim.
Secondary sources are books, articles, or reports that are based on primary sources and are intended for scholarly or professional audiences.
Tertiary sources are books and articles that synthesize and report on secondary sources for general readers, such as textbooks, articles in encyclopedias, and articles in mass-circulation publications.
The distinction between primary and secondary sources comes from 19th century historians, and the idea of tertiary sources came later. The boundaries can be fuzzy, and are certainly dependent on the task at hand.
I want to reason about what these distinctions look like in the alignment community, and whether or not they’re important.
The rest of chapter five is about how to use libraries and information technologies, and evaluating sources for relevance and reliability.
Chapter 6 starts off with the kind of thing you should be looking for while you read
Look for creative agreement
Offer additional support. You can offer new evidence to support a source’s claim.
Confirm unsupported claims. You can prove something that a source only assumes or speculates about.
Apply a claim more widely. You can extend a position.
Look for creative disagreement
Contradictions of kind. A source says something is one kind of thing, but it’s another.
Part-whole contradictions. You can show that a source mistakes how the parts of something are related.
Developmental or historical contradictions. You can show that a source mistakes the origin or development of a topic.
External cause-effect contradictions. You can show that a source mistakes a causal relationship.
Contradictions of perspective. Most contradictions don’t change a conceptual framework, but when you contradict a “standard” view of things, you urge others to think in a new way.
The rest of chapter 6 is a few more notes about what you’re looking for while reading (evidence, reasons), how to take notes, and how to stay organized while doing this.
The alignment community
I think I see the creative agreement modes and the creative disagreement modes floating around in posts. Would it be more helpful if writers decided on one or two of these modes before sitting down to write?
Moreover, what is a primary source in the alignment community? Surely if one is writing about inner alignment, a primary source is the Risks from Learned Optimization paper. But what are Risks’ primary, secondary, tertiary sources? Does it matter?
Now look at Arbital. Arbital started off to be a tertiary source, but articles that seemed more like primary sources started appearing there. I remember distinctively thinking “what’s up with that?” it struck me as awkward for Arbital to change it’s identity like that, but I end up thinking about and citing the articles that seem more like primary sources.
There’s also the problem of stuff in the memeplex not written down is the real “primary” source while the first person who happens to write it down looks like they’re writing a primary source when in fact what they’re doing is really more like writing a secondary or even tertiary source.
Yesterday I quit my job for direct work on epistemic public goods! Day one of direct work trial offer is April 4th, and it’ll take 6 weeks after that to know if I’m a fulltime hire.
I’m turning down
raise to 200k/yr usd
building lots of skills and career capital that would give me immense job security in worlds where investment into one particular blockchain doesn’t go entirely to zero
having fun on the technical challenges
for
confluence of my skillset and a theory of change that could pay huge dividends in the epistemic public goods space
0.35x paycut from my upcoming raise
uncertainty of it being a trial offer.
having fun on the technical challenges
Which I’m flagging in such detail to give you strength if you’re ever reasoning about your risk tolerance and your goals, just remember, “look at what quinn did!”
I think this is a crucial part of a lot of psychological maladaption and social dysfunction, very salient to EAs. If you’re way more trait xyz than anyone you know for most of your life, your behavior and mindset will be massively effected, and depending on when in life / how much inertia you’ve accumulated by the time you end up in a different room where suddenly you’re average on xyz, you might lose out on a ton of opportunities for growth.
In other words, the concept of “big fish small pond” is deeply insightful and probably underrated.
Some IQ-adjacent idea is sorta the most salient to me, since my brother recently reminded me “quinn is the smartest person I know”, to which I was like, you should meet smarter people? Or I kinda did feel unusually smart before I was an EA, I can only reasonably claim to be average if you condition on EA or something similar. But this post is extremely important in terms of each of the Big 5, “grit”-adjacent things, etc.
For example, when you’re way more trait xyz than anyone around you, you form habits around adjusting for people to underperform relative to you at trait xyz. Sometimes those habits run very deep in your behavior and wordview, and sometimes they can be super ill-tuned (or at least a bit suboptimal) to becoming average. Plus, you develop a lot of “I have to pave my own way” assumptions about growth and leadership. Related to growth, you may cultivate lower standards for yourself than you otherwise might have. Related to leadership, I expect many people in leader roles at small ponds would be more productive, impactful, and happy if they had access to averageness. Pond size means they don’t get that luxury!
There’s a tightly related topic about failure to abolish meatspace / how you might think the internet corrects for this but later realize how much it doesn’t.
So, being a “big fish in a small pond” teaches you habits that become harmful when you later move to a larger pond. But if you don’t move, you can’t grow further.
I think the specific examples are more known that the generalization. For example:
Many people in Mensa are damaged this way. They learned to be the smartest ones, which they signal by solving pointless puzzles, or by talking about “smart topics” (relativity, quantum, etc.) despite the fact that they know almost nothing about these topics. Why did they learn these bad habits? Because this is how you most efficiently signal intelligence to people who are not themselves intelligent. But it fails to impress the intelligent people used to meeting other intelligent people, because they see the puzzles as pointless, they see the smart talk as bullshit if they ever read an introductory textbook on the topic, and will ask you about your work and achievements instead. The useful thing would instead be to learn how to cooperate with other intelligent people on reaching worthy goals.
People who are too smart or too popular at elementary school (or high school) may be quite shocked when they move to a high school (or university) and suddenly their relative superpowers are gone. If they learned to rely on them too much, they may have a problem adapting to normal hard work or normal friendships.
Staying at the same job for too long might have a similar effect. You feel like an expert because you are familiar with all systems in the company. Then at some moment fate makes you change jobs, and suddenly you realize that you know nothing, that the processes and technologies used in your former company were maybe obsolete. But the more you delay changing jobs, the harder it becomes.
I remember reading in a book by László Polgár, father of the famous female chess players, how he wanted his girls to play in the “men’s” chess league since the beginning, because that’s what he wanted them to win. He was afraid that playing in smaller leagues would learn them habits useful only for the smaller leagues. Technically, the “men’s” chess league was open for everyone, but because there were no women among the winners (yet), a separate league only for women was made. Polgár did not want his girls to compete in the league for women, and that offended many people.
From evolutionary perspective, when people lived in small tribes, if you were the best in your tribe, you remained the best in your tribe (maybe until someone younger than you outcompeted you a few years later). So it made sense to adapt to the situation you had. Our society is weirdly organized from this perspective—as an adult, you will be pushed to compete against the best (sometimes literally in the entire world), and yet as a small child you are put into an elementary school with average kids, where you get the wrong expectations of your future environment. A partial antidote to that are various competitions, where you can compete against similarly talented kids from other schools, so even if you are by far the best at your school, you still know there is much to learn.
Technically, the “men’s” chess league was open for everyone, but because there were no women among the winners
I think this wasn’t true at the time, at least in Hungary. The oldest sister and their father spent a lot of time fighting this, so it was ~true by the time the youngest sister got really competitive. This might prove the larger point, since the youngest sister also went the farthest.
Uh, good catch! Then I am surprised that they actually succeeded to win this. It would be too easy and possibly very tempting to just say “you broke the rules, disqualified!” Or at least, I would expect a debate to last for a decade, and then it would be too late for the Polgár sisters.
yeah IQ ish things or athletics are the most well-known examples, but I only generalized in the shortform cuz I was looking around at my friends and thinking about more Big Five oriented examples.
Certainly “conscientiousness seems good but I’m exposed to the mistake class of unhelpful navelgazing, so maybe I should be less conscientious” is so much harder to take seriously if you’re in a pond that tends to struggle with low conscientiousness. Or being so low on neuroticism that your redteam/pentest muscles atrophy.
I think a property of my theory of change is that academic and commercial speed is a bottleneck. I recently realized that my mass assignment for timelines synchronized with my mass assignment for the prosaic/nonprosaic axis. The basic idea is that let’s say a radical new paper that blows up and supplants the entire optimization literature gets pushed to the arxiv tomorrow, signaling the start of some paradigm that we would call nonprosaic. The lag time for academics and industry to figure out what’s going on, figure out how to build on that result, for developer ecosystems to form, would all compound to take us outside of what we would call “short timelines”.
The reasoning assumes that ideas are first generated in academia and don’t arise inside of companies. With DeepMind outperforming the academic protein folding community when protein folding isn’t even the main focus of DeepMind I consider it plausible that new approaches arise within a company and get only released publically when they are strong enough to have an effect.
Even if there’s a paper most radical new papers get ignored by most people and it might be that in the beginning only one company takes the idea seriously and doesn’t talk about it publically to keep a competive edge.
That’s totally fair, but I have a wild guess that the pipeline from google brain to google products is pretty nontrivial to traverse, and not wholly unlike the pipeline from arxiv to product.
Like, AlexNet was 2012, DeepMind patented deep Q learning in 2014, the first TensorFlow release was 2015, the first PyTorch release was 2016, the first TPU was 2016, and by 2019 we had billion-parameter GPT-2 …
So if you say “Short is ≤2 years”, then yeah, I agree. If you say “Short is ≤8 years”, I think I’d disagree, I think 8 years might be plenty for a non-prosaic approach. (I think there are a lot of people for whom AGI in 15-20 years still counts as “short timelines”. Depends on who you’re talking to, I guess.)
I should’ve mentioned in OP but I was lowkey thinking upper bound on “short” would be 10 years.
I think developer ecosystems are incredibly slow (longer than ten years for a new PL to gain penetration, for instance). I guess under a singleton “one company drives TAI on its own” scenario this doesn’t matter, because tooling tailored for a few teams internal to the same company is enough which can move faster than a proper developer ecosystem. But under a CAIS-like scenario there would need to be a mature developer ecosystem, so that there could be competition.
I feel like 7 years from AlexNet to the world of PyTorch, TPUs, tons of ML MOOCs, billion-parameter models, etc. is strong evidence against what you’re saying, right? Or were deep neural nets already a big and hot and active ecosystem even before AlexNet, more than I realize? (I wasn’t paying attention at the time.)
Moreover, even if not all the infrastructure of deep neural nets transfers to a new family of ML algorithms, much of it will. For example, the building up of people and money in ML, the building up of GPU / ASIC servers and the tools to use them, the normalization of the idea that it’s reasonable to invest millions of dollars to train one model and to fab ASICs tailored to a particular ML algorithm, the proliferation of expertise related to parallelization and hardware-acceleration, etc. So if it took 7 years from AlexNet to smooth turnkey industrial-scale deep neural nets and billion-parameter models and zillions of people trained to use them, then I think we can guess <7 years to get from a different family of learning algorithms to the analogous situation. Right? Or where do you disagree?
No you’re right. I think I’m updating toward thinking there’s a region of nonprosaic short-timelines universes. Overall it still seems like that region is relatively much smaller than prosaic short-timelines and nonprosaic long-timelines, though.
For every person who has a bad reason that they catch because you say “sounds like cope”, there are 10x as many people who find their reason actually compelling. Saying “if that was my reason it would be a sign I was in denial of how hard I was coping” or “I don’t think that reason is compelling” isn’t really relevant to the person you’re ostensibly talking to, who’s trying to make the best decisions for the best reasons. Just say you don’t understand why the reason is compelling.
I’m not sure I have experienced a “sounds like cope” reasoning, or at least it doesn’t match to discussions I’ve noted. Is this similar to “people under stress are bad at updating”? Why would you expect them to be better at communicating than they are at reasoning?
I asked a friend whether I should TA for a codeschool called ${{codeschool}}.
You shouldn’t hang around ${{codeschool}}. People at ${{codeschool}} are not pursuing excellence.
A hidden claim there that I would soak up the pursuit of non-excellence by proximity or osmosis isn’t what’s interesting (though I could see that turning out either way). What’s interesting is the value of non-excellence, which I’ll call adequacy.
${{codeschool}} in this case is effective and impactful at putting butts in seats at companies, and is thereby responsible for some negligible slice of economic growth. It’s students and instructors are plentiful with the virtue of getting things done, do they really need the virtue of high-craftsmanship? The student who reads SICP and TAPL because they’re pursuing mastery over the very nature of computation is strictly less valuable to the economy than the student who reads react tutorials because they’re pursuing some cash.
Obviously, my friend who was telling me this was of the SICP/TAPL type. In software, this is problematic: lisp and type theory will increase your thinking about the nature of computation, but will it increase your thinking about the social problem of steering a team? From an employer’s perspective, it is naive to prefer excellence over adequacy, it is much wiser to saddle the excellent person with the burden of proving that they won’t get bored easily.
Hufflepuffs can go far, and the fuel is adequacy. Enough competence to get it done, any more is egotistical, a sunk cost.
But what if it’s not about industry/markets, what if it’s about the world’s biggest problems? Don’t we want people who are more competent than strictly necessary to be working on them? Maybe, maybe not.
For a long time I’ve operated in the excellence mindset: more energy for struggling with textbooks than for exploiting the skills I already have to ship projects and participate in the real world. Thinking it might be good to shift gears and flex my hufflepuff virtues more.
The student who reads SICP and TAPL because they’re pursuing mastery over the very nature of computation is strictly less valuable to the economy than the student who reads react tutorials because they’re pursuing some cash.
Seems to me that on the market there are very few jobs for the SICP types.
The more meta something is, the less of that is needed. If you can design an interactive website, there are thousands of job opportunities for you, because thousands of companies want an interactive website, and somehow they are willing to pay for reinventing the wheel. If you can design a new programming language and write a compiler for it… well, it seems that world already has too many different programming languages, but sure there is a place for maybe a dozen more. The probability of success is very small even if you are a genius.
The best opportunity for developers who think too meta is probably to design a new library for an already popular programming language, and hope it becomes popular. The question is how exactly you plan to get paid for that.
Probably another problem is that it requires intelligence to recognize intelligence, and it requires expertise to recognize expertise. The SICP type developer seems to most potential employers and most potential colleagues as… just another developer. The company does not see individual output, only team output; it does not matter that your part of code does not contain bugs, if the project as a whole does. You cannot use solutions that are too abstract for your colleagues, or for your managers. Companies value replaceability, because it is less fragile and helps to keep developer salaries lower than they might be otherwise. (In theory, you could have a team full of SICP type developers, which would allow them to work smarter, and yet the company would feel safe. In practice, companies can’t recognize this type and don’t appreciate it, so this is not going to happen.)
Again, probably the best position for a SICP type developer in a company would be to develop some library that the rest of the company would use. That is, a subproject of a limited size that the developer can do alone, so they are not limited in the techniques they use, as long as the API is comprehensible. Ah, but before you are given such opportunity, you usually have to prove yourself in the opposite type of work.
Sometimes I feel like having a university for software developers just makes them overqualified for the market. A vocational school focusing on the current IT hype would probably make most companies more happy. Also the developers, though probably only in short term, before a new hype comes and they face the competition of a new batch of vocational school graduates trained for the new hype. A possible solution for the vocational school would be to also offer retraining courses for their former students, like three or six months to become familiar with the new hype.
I used to think “community builder” was a personality trait I couldn’t switch off, but once I moved to the bay I realized that I was just desperate for serendipity and knew how to take it from 0% to 1%. Since the bay is constantly humming at 70-90% serendipity, I simply lost the urge to contribute.
Benefactors are so over / beneficiaries are so back / etc.
Let FairBot be the player that sends an opponent to Cooperate (C) if it is provable that they cooperate with FairBot, and sends them to Defect (D) otherwise.
Let FairBot_k be the player that searches for proofs of length <= k that it’s input cooperates with FairBot_k, and cooperates if it finds one, returning defect if all the proofs of length <= k are exhausted without one being valid.
Critch writes that “100%” of the time, mathematicians and computer scientists report believing that FairBot_k(FairBot_k) = D, owing to the basic vision of a stack overflow exceeding the value k (spoiler in the footnote[1] for how it actually shakes out, in what is now a traditional result in open source game theory).
I am one of these people who believe that FairBot_k(FairBot_k) = D, because I don’t understand Löb, nor do I understand parametric Löb. But I was talking about this on two separate occasions with friends Ben and Stephen, both of whom made the same remark, a remark which I have not seen discussed.
The solution set of an equation approach.
One shorter way of writing FairBot is this
FB:=a↦a(FB)
because when a lands in {C,D}, ifa(FB)=CthenCelseD collapses to a(FB).
Here, I’m being sloppy about evaluation vs. provability. I’m taking what was originally ”a(FB) is provable” and replacing it with ”a is evaluable at FB”, and assuming decidability so that I can reflect into bool for testing in an if. Then I’m actually performing the evaluation.
Stepping back, if you can write down
E:FB(a)=a(FB)
and you know that a and FB share a codomain (the moves of the game, in this case {C,D}), then the solution set of this equation SS(E)=coda=codFB. In other words, the equation is consistent at a(FB)=C=FB(a) and consistent at a(FB)=D=FB(a), and there may not be a principled way of choosing one particular item of SS(E) in general. In other words, the proofs of the type FB(a)=a(FB) are not unique.
What the heck is the type-driven story?
I’m guessing there’s some solution to this problem in MIRI’s old haskell repo, but I haven’t been able to find it reading the code yet.
I can’t think of a typey way to justify A:=A→{C,D}:Type. It’s simply nonsensical, or I’m missing something about a curry-howard correspondence with arithmoquining. In other words, agents like FairBot that take other agents as input and return moves are a lispy/pythonic warcrime, in terms of type-driven ideology.
Questions
Am I confused because of a subtlety distinguishing evaluability and provability?
Am I confused because of some nuance about how recursion really works?
it turns out that Löb’s theorem implies that FairBot cooperates with FairBot, and a proof-length-aware variant of Löb’s theorem implies that FairBot_k cooperates with FairBot_k.
It is almost certainly true that setting k=1, Fairbot_1 defects against Fairbot_1 because there are no proofs of cooperation that are 1 bit in length. There can be exceptions: for instance, where Fairbot_1(Fairbot_1) = C is actually an axiom, and represented with a 1-bit string.
It is definitely not true that Fairbot_k cooperates with Fairbot_k for all k and all implementations of Fairbot_k, with or without Löb’s theorem. It is also definitely not true that Fairbot_k defects against Fairbot_k in general. Whether they cooperate or defect depends upon exactly what proof system and encoding they are using.
I think that to get the type of the agent, you need to apply a fixpoint operator. This also happens inside the proof of Löb for constructing a certain self-referential sentence. (As a breadcrumb, I’ve heard that this is related to the Y combinator.)
I find myself, just as a random guy, deeply impressed at the operational competence of airports and hospitals. Any good books about that sort of thing?
It is pretty impressive that they function as well as they do, but seeing how the sausage is made (at least in hospitals) does detract from it quite substantially. You get to see not only how an enormous number of battle hardened processes prevent a lot of lethal screw-ups, but also how also how sometimes the very same processes cause serious and very occasionally lethal screw-ups.
It doesn’t help that hospitals seem to be universally run with about 90% of the resources they need to function reasonably effectively. This is possibly because there is relentless pressure to cut costs, but if you strip any more out of them then people start to die from obviously preventable failures. So it stabilizes at a point where everything is much more horrible than it could be, but not quite to an obviously lethal extent.
As far as your direct question goes, I don’t have any good books to recommend.
Rats and EAs should help with the sanity levels in other communities
Consider politics. You should take your political preferences/aesthetics, go to the tribes that are based on them, and help them be more sane. In the politics example, everyone’s favorite tribe has failure modes, and it is sort of the responsibility of the clearest-headed members of that tribe to make sure that those failure modes don’t become the dominant force of that tribe.
Speaking for myself, having been deeply in an activist tribe before I was a rat/EA, I regret I wasn’t there to help the value-aligned and clear-headed over the last few years while some of that tribe’s worst pathologies made gains. Now it seems almost too late for them.
Actionably, I want you to
Write for journals, forums, blogospheres, zines outside of rat and EA.
Dump time into tribes that might not be the state of the art in sanity, find the most sane people there, and find ways to support them.
I speak not (well, not entirely) from my cognitive dissonance at having abandoned an aesthetic I still have feelings for. I think
Tribes besides ours are what make up the overall sanity waterline
It’s ok to set aside humility and imposter syndrome and say “I can actionably be a resource of sanity for someone else”, even tho you personally think you have a lot of work to do at getting less wrong yourself. I would say the opposite of the “affix your mask before helping others” comic strip: find synergies between mentoring others in the art and continuing to master the art yourself.
We basically want every tribe to believe true things and think clearly about their values. Yes, I’m obviously concerned that this will lead to some of my fellow rats taking my advice, applying it to a political aesthetic I find barbaric, and helping that political aesthetic win—I think this concern is basically fine because on net I expect more true beliefs and clear thinking about values to make the meaning of winning for each tribe converge on something that isn’t zero-sum.
I should also mention that I expect an externality from this effort to be an increase in the intrarat / intraEA intellectual diversity.
Broadly, the two kinds of claims are conceptual and practical.
Conceptual claims ask readers not to ask, but to understand. The flavors of conceptual claim are as follows:
Claims of fact or existence
Claims of definition and classification
Claims of cause and consequence
Claims of evaluation or appraisal
There’s essentially one flavor of practical claim
Claims of action or policy.
If you read between the lines, you might notice that a kind of claim of fact or cause/consequence is that a policy works or doesn’t work to bring about some end. In this case, we see that practical claims deal in ought or should. There is a difference, perhaps subtle perhaps not, between “X brings about Y” and “to get Y we ought to X”.
Readers expect a claim to be specific and significant. You can evaluate your claim along these two axes.
To make a claim specific, you can use precise language and explicit logic. Usually, precision comes at the cost of a higher word count. To gain explicitness, use words like “although” and “because”. Note some fields might differ in norms.
You can think of significance of a claim as the quantity it asks readers to change their mind, or I suppose even behavior.
While we can’t quantify significance, we can roughly estimate it: if readers accept a claim, how many other beliefs must they change?
Avoid arrogance.
As paradoxical as it seems, you make your argument stronger and more credible by modestly acknowledging its limits.
Two ways of avoiding arrogance are acknowledging limiting conditions and using hedges to limit certainty.
Don’t run aground: there are innumerable caveats that you could think of, so it’s important to limit yourself only to the most relevant ones or the ones that readers would most plausibly think of. Limiting certainty with hedging is given by example of Watson and Crick, publishing what would become a high-impact result, “We wish to suggest … in our opinion … we believe … Some … appear”
without the hedges, Crick and Watson would be more concise but more aggressive.
In most fields, readers distrust flatfooted certainty
It is not obvious how to walk the line between hedging too little and hedging too much.
It is not obvious how to walk the line between hedging too little and hedging too much.
This may be context-dependent. Different countries probably have different cultural norms. Norms may differ for higher-status and lower-status speakers. Humble speech may impress some people, but others may perceive it as a sign of weakness. Also, is your audience fellow scientists or are you writing a popular science book? (More hedging for the former, less hedging for the latter.)
notes (from a very jr researcher) on alignment training pipeline
Training for alignment research is one part competence (at math, cs, philosophy) and another part having an inside view / gears-level model of the actual problem. Competence can be outsourced to universities and independent study, but inside view / gears-level model of the actual problem requires community support.
A background assumption I’m working with is that training as a longtermist is not always synchronized with legible-to-academia training. It might be the case that jr researchers ought to publication-maximize for a period of time even if it’s at the expense of their training. This does not mean that training as a longtermist is always or even often orthogonal to legible-to-academia training, it can be highly synchronized, but it depends on the occasion.
It’s common to query what relative ratio should be assigned to competence building (textbooks, exercises) vs. understanding the literature (reading papers and alignment forum), but perhaps there is a third category- honing your threat model and theory of change.
I spoke with a sr researcher recently who roughly said that a threat model with a theory of change is almost sufficient for an inside view / gears-level model. I’m working from the theory that honed threat models and your theory of change are important to calculate interventions. See Alice and Bob in Rohin’s faq.
I’ve been trying by doing exercises with a group of peers weekly to hone my inside view / gears-level model of the actual problem. But the sr researcher i spoke to said mentorship trees of 1:1 time, not exercises that jrs can just do independently or in groups, is the only way it can happen. This is troublesome to me, as the bottleneck becomes mentors’ time. I’m not so much worried about the hopefully merit-based process of mentors figuring out who’s worth their time, as I am about the overall throughput. It gets worse though- what if the process is credentialist?
Take a look at the Critch quote from the top of Rohin’s faq:
I get a lot of emails from folks with strong math backgrounds (mostly, PhD students in math at top schools) who are looking to transition to working on AI alignment / AI x-risk.
Is he implicitly saying that he offloads some of the filtering work to admissions people at top schools? Presumably people from non-top schools are also emailing him, but he doesn’t mention them.
I’d like to see a claim that admissions people at top schools are trustworthy. No one has argued this to my knowledge. I think sometimes the movement falls back on status games, unless there is some intrinsic benefit to “top schools” (besides building social power/capital) that everyone is aware of. (Indeed if someone’s argument is that they identified a lever that requires a lot of social power/capital, then they can maybe put that top school on their resume to use, but if the lever is strictly high quality useful research (instead of say steering a federal government) this doesn’t seem to apply).
Is he implicitly saying that he offloads some of the filtering work to admissions people at top schools?
I don’t think Critch’s saying that the best way to get his attention is through cold emails backed up by credentials. The whole post is about him not using that as a filter to decide who’s worth his time but that people should create good technical writing to get attention.
Critch’s written somewhere that if you can get into UC Berkeley, he’ll automatically allow you to become his student, because getting into UC Berkeley is a good enough filter.
Where did he say that? Given that he’s working at UC Berkeley I would expect him to treat UC Berkeley students preferentially for reasons that aren’t just about UC Berkeley being able to filter.
It’s natural that you can sign up for one of the classes he teaches at UC Berkeley by being a student of UC Berkeley.
Being enrolled into MIT might be just as hard as being enrolled into UC Berkeley but it doesn’t give you the same access to courses taught at UC Berkeley by it’s faculty.
If you get into one of the following programs at Berkeley:
a PhD program in computer science, mathematics, logic, or statistics, or
a postdoc specializing in cognitive science, cybersecurity, economics, evolutionary biology, mechanism design, neuroscience, or moral philosophy,
… then I will personally help you find an advisor who is supportive of you researching AI alignment, and introduce you to other researchers in Berkeley with related interests.
and also
While my time is fairly limited, I care a lot about this field, and you getting into Berkeley is a reasonable filter for taking time away from my own research to help you kickstart yours.
I’m excited for language model interpretability to teach us about the difference between compilers and simulations of compilers. In the sense that chatgpt and I can both predict what a compiler of a suitably popular programming language will do on some input, what’s going on there---- surely we’re not reimplementing the compiler on our substrate, even in the limit of perfect prediction? Will be an opportunity for a programming language theorist in another year or two of interp progress
In the Safeguarded AI programme thesis[1], proof certificates or
certifying algorithms are relied upon in the theory of change. Let’s
discuss!
From the thesis:
Proof certificates are a quite broad concept, introduced by[33] : a
certifying algorithm is defined as one that produces enough metadata
about its answer that the answer’s correctness can be checked by an
algorithm which is so simple that it is easy to understand and to
formally verify by hand.
A certifying algorithm is an algorithm that produces, with each
output, a certificate or witness (easy-to-verify proof) that the
particular output has not been compromised by a bug. A user of a
certifying algorithm inputs x, receives the output y and the
certificate w, and then checks, either manually or by use of a
program, that w proves that y is a correct output for input x. In this
way, he/she can be sure of the correctness of the output without
having to trust the algorithm.
In this memo, I do an overview by distinguishing a proof cert from a
constructive proof, and dive shallowly into the paper. Then I ruminate
on how this might apply in AI contexts.
What’s the difference between a proof cert and a constructive proof?
If a cert is just a witness, you might think that proof certs are just
proofs from constructive math as distinct from classical math. For
example, under some assumptions and caveats (in the calculus/topology
setting) a function can be “known” to have a fixed point without us
methodically discovering them with any guarantees, while under other
assumptions and caveats (namely the lattice theoretic setting) we know
there’s a fixed point because we have a guaranteed procedure for
computing it. However, they go further: a certifying algorithm produces
with each output a cert/witness that the particular output has, in
the case of McConnell et al, “not been compromised by a bug”. This
isn’t a conceptual leap from constructive math, in which a proof is
literally a witness-building algorithm, it looks to me a bit like a memo
saying “btw, don’t throw out the witness” along with notes about
writing cheap verifiers that do not simply reimplement the input
program logic.
One way of looking at a certificate is that it’s dependent on
metadata. This may simply be something read off of an execution trace,
or some state that logs important parts of the execution trace to
construct a witness like in the bipartite test example. In the bipartite
test example, all you need is for the algorithm that searches for a
two-coloring or an odd cycle, and crucially will return either the
two-coloring or the odd cycle as a decoration on it’s boolean output
(where “true” means “is bipartite”). Then, a cheap verifier (or
checker) is none other than the pre-existing knowledge that
two-colorable and bipartite are equivalent, or conversely that the
existence of an odd cycle is equivalent to disproving bipartiteness.
Chapter 11 of [3] will in fact discuss certification and verification
supporting eachother, which seems to relax the constraint that a cheap
verifier doesn’t simply reimplement the certifier logic.
The main difference between a standard proof as in an applied type
theory (like lean or coq) and a proof cert is kinda cultural and not
fundamental, in that proof certs prefer to emphasize single IO pairs and
lean/coq proofs often like to emphasize entire input types. Just don’t
get confused on page 34 when it says “testing on all inputs”—it
means that a good certifying algorithm is a means to generate exhaustive
unit tests, not that the witness predicate actually runs on all inputs
at once. It seems like the picking out of only input of interest and not
quantifying over the whole input type will be the critical consideration
when we discuss making this strategy viable for learned components or
neural networks.
Formally:
Skip this if you’re not super compelled by what you know so far about
proof certificates, and jump down to the section on learned programs.
Consider algorithms α:X→Y. A precondition
ϕ:X→B and a postcondition
ψ:X→Y→B form an IO-spec or
IO-behavior. An extension of the output set Y⊥=Y∪{⊥}
is needed to account for when the precondition is violated. We have a
type W of witness descriptions. Finally, we describe the witness
predicate with
W:X→Y⊥→W→B
that says IO pair xy is witnessed by w. So when ψxy is true,
w must be a witness to the truth, else a witness to the falsehood.
McConnell et al distinguish witness predicates from strong witness
predicates, where the former can prove ¬ϕx∨ψxy but
the latter knows which disjunct.
A checker for W is an algorithm sending
xyw↦Wxyw. It’s desiderata / nice to haves are
correctness, runtime linear in input, and simplicity.
I’m not that confident this isn’t covered in the document somewhere,
but McConnell et al don’t seem paranoid about false functions. If Alice
(sketchy) claims that a y came from α on x, but she lied to
Bob and in fact ran α′, there’s no requirement for witness
predicates to have any means of finding out. (I have a loose intuition
that the proper adversarial adaptation of proof certs would be possible
on a restricted algorithm set, but impose a lot of costs). Notice that
W is with respect to a fixed x,
W′:∀(x:X),Y(X)→W(X)→B
would be a different approach! We should highlight that a witness
predicate only tells you about one input.
Potential for certificates about learning, learned artefacts, and learned artefacts’ artefacts
The programme thesis continues:
We are very interested in certificates, because we would like to rely
on black-box advanced AI systems to do the hard work of searching for
proofs of our desired properties, yet without compromising confidence
that the proofs are correct. In this programme, we are specifically
interested in certificates of behavioural properties of cyber-physical
systems (ranging from simple deterministic functions, to complex
stochastic hybrid systems incorporating nuances like nondeterminism
and partiality).
If AIs are developing critical software (like airplanes, cars, and
weaponry) and assuring us that it’s legit, introducing proof certs
removes trust from the system. The sense in which I think
Safeguarded AI primarily wants to use proof certs certainly doesn’t
emphasize certs of SGD or architectures, nor does it even emphasize
inference-time certs (though I think it’d separately be excited about
progress there). Instead, an artefact that is second order removed from
the training loop seems to be the target of proof certs, like a software
artefact written by a model but it’s runtime is not the inference time
of the model. I see proof certs in this context as primarily an
auditing/interpretability tool. There’s no reason a model should write
code that is human readable, constraining it to write human readable
code seems hard, letting the code be uninterpretable but forcing it to
generate interpretable artefacts seems pretty good! The part I’m mostly
not clear on is why proof certs would be better than anything else in
applied type theory (where proofs are just lambdas, which are very
decomposable- you can’t read them all at once but they’re easy to pick
apart) or model checking (specs are logical formulae, even if reading
the proof that an implementation obeys a spec is a little hairy).
It’s important that we’re not required to quantify over a whole type
to get some really valuable assurances, and this proof cert framework is
amenable to that. On the other hand, you can accomplish the same
relaxation in lean or coq, and we can really only debate over
ergonomics. I think ultimately, the merits of this approach vs other
approaches are going to be decided entirely on the relative ergonomics
of tools that barely exist yet.
Overall impression
McConnell et al is 90 pages, and I only spent a few hours with it. But I
don’t find it super impressive or compelling. Conceptually, a
constructive proof gets me everything I want out of proof certificates.
A tremendous value proposition would be to make these certs easier to
obtain for real world programs than what coq or lean could provide, but
I haven’t seen those codebases yet. There’s more literature I may look
at around proof-carrying code (PCC)[4] which is similar ideas, but I’m
skeptical that I’ll be terribly compelled since the insight “just
decorate the code with the proof” isn’t very subtle or difficult.
Moreover, this literature just seems kinda old I don’t see obvious
paths from it to current research.
Any tips for getting out of a “rise to the occasion mindset” and into a “sink to your training” mindset?
I’m usually optimizing for getting the most out of my A-game bursts. I want to start optimizing for my baseline habits, instead. I should cover the B-, C-, and Z-game; the A-game will cover itself.
Mathaphorically, “rising to the occasion” is taking a max of a max, whereas “sinking to the level of your habits” looks like a greatest lower bound.
I’m really tired of high functioning super smart “autism” like ok we all have madeup diagnoses—anyone with a IQ slightly above 90 knows that they can learn the slogans to manipulate gatekeepers to get performance enhancement, and they decide not to if they think theyre performing well enough already. That doesn’t mean “ADHD” describes something in the world. Similarly, there’s this drift of “autism” getting more and more popular. It’s obnoxious because labels and identities are obnoxious, but i only find it repulsive because of the general trend of articulate and charismatic minorities setting agendas that effect the less talkative (and worse off!) fellow minorities https://open.substack.com/pub/jessesingal/p/why-disability-advocates-are-trying?utm_source=share&utm_medium=android&r=5hj2m (I only read up to free tier, but I’ve seen a bunch of this stuff).
Maybe neuroscientists or psychologists have good reasons for this, but “autism” is the most immensely deranged word in the history of categories—what utility is a word that crosses absent minded professors and people who can’t conceptually distinguish a week from a month insofar as you can wordlessly elicit conceptual understanding from them???? If you worked at the dictionary factory and you tried to slip that word in, you’d be fired immediately. So why do psychologists or neuroscientists get away with this???
I’m a computer guy. I’m bad at social queues sometimes. I feel at home in lots of sperg culture. But leave me out of your stolen valor campaign. We’re fine, you guys—we’re smart enough to correct for most of the downsides.
For the record, to mods: I waited till after petrov day to answer the poll because my first guess upon receiving a message on petrov day asking me to click something is that I’m being socially engineered. Clicking the next day felt pretty safe.
“EV is measure times value” is a sufficiently load-bearing part of my worldview that if measure and value were correlated or at least one was a function of the other I would be very distressed.
Like in a sense, is John threatening to second-guess hundreds of years of consensus on is-ought?
failure mode can be understood as trying to aristotle the problem, lack of experimentation
thinking about the nanotech ASI threat model, where it solves nanotech overnight and deploys adversarial proteins in all the bloodstreams of all the lifeforms.
These are sometimes justified by Drexler’s inside view of boundary conditions and physical limits.
But to dodge the aristotle problem, there would have to be an amount of bandwidth of what’s passing between sensors and actuators (which may roughly correspond to the number of do applications in pearl)
Can you use something like communication complexity https://en.wikipedia.org/wiki/Communication_complexity (between a system and an environment) to think about “lower bound on the number of sensor-actuator actions” mixed with sample complexity (statistical learning theory)
Like ok if you’re simulating all of physics you can aristotle nanotech, for a sufficient definition of “all” that you would run up against realizability problems and cost way more than you actually need to spend.
Like I’m thinking if there’s a kind of complexity theory of pearl (number of do applications needed to acquire some kind of “loss”), then you could direct that at something like “nanotech projects” to fermstimate the way AIs might tradeoff between applying aristotlean effort (observation and induction with no experiment) and spending sensor-actuator interactions (with the world).
There’s a scenario in the sequences if I recall correctly about which physics an AI infers from 3 frames of a video of an apple falling, and something about how security mindset suggests you shouldn’t expect your information-theoretic calculation that einsteinian physics is impossible to believe from the three frames to actually apply to the AI. Which is a super dumbed down way of opening up this sort of problem space.
Methods, famously, includes the line “I am a descendant of the line of Bacon”, tracing empiricism to either Roger (13th century) or Francis (16th century) (unclear which).
Though a cursory wikiing shows an 11th century figure providing precedents for empiricism! Alhazen or Ibn al-Haytham worked mostly optics apparently but had some meta-level writings about the scientific method itself. I found this shockingly excellent quote
The duty of the man who investigates the writings of scientists, if learning the truth is his goal, is to make himself an enemy of all that he reads, and … attack it from every side. He should also suspect himself as he performs his critical examination of it, so that he may avoid falling into either prejudice or leniency.
Should we do more to celebrate Alhazen as an early rationalist?
New discord server dedicated to multi-multi delegation research
DM me for invite if you’re at all interested in multipolar scenarios, cooperative AI, ARCHES, social applications & governance, computational social choice, heterogeneous takeoff, etc.
(side note I’m also working on figuring out what unipolar worlds and/or homogeneous takeoff worlds imply for MMD research).
Last time we discussed the difference between information and a question or a problem, and I suggested that the novelty-satisfied mode of information presentation isn’t as good as addressing actual questions or problems. In chapter 3 which I have not typed up thoughts about, A three step procedure is introduced
Topic: “I am studying …”
Question: ”… because I want to find out what/why/how …”
Significance: ”… to help my reader understand …”
As we elaborate on the different kinds of problems, we will vary this framework and launch exercises from it.
Some questions raise problems, others do not. A question raises a problem if not answering it keeps us from knowing something more important than its answer.
The basic feedback loop introduced in this chapter relates practical with conceptual problems and relates research questions with research answers.
Practical problem -> motivates -> research question -> defines -> conceptual/research problem -> leads to -> research answer -> helps to solve -> practical problem (loop)
What should we do vs. what do we know—practical vs conceptual problems
Opposite eachother in the loop are practical problems and conceptual problems. Practical problems are simply those which imply uncertainty over decisions or actions, while conceptual problems are those which only imply uncertainty over understanding. Concretely, your bike chain breaking is a practical problem because you don’t know where to get it fixed, implying that the research task of finding bike shops will reduce your uncertainty about how to fix the bike chain.
Conditions and consequences
The structure of a problem is that it has a condition (or situation) and the (undesirable) consequences of that condition. The consequences-costs model of problems holds both for practical problems and conceptual problems, but comes in slightly different flavors. In the practical problem case, the condition and costs are immediate and observed. However, a chain of “so what?” must be walked.
Readers judge the significance of your problem not by the cost you pay but by the cost they pay if you don’t solve it… To make your problem their problem, you must frame it from their point of view, so that they see its cost to them.
One person’s cost may be another person’s condition, so when stating the cost you ought to imagine a socratic “so what?” voice, forcing you to articulate more immediate costs until the socratic voice has to really reach in order to say that it’s not a real cost.
The conceptual problem case is where intangibles play in. The condition in that case is always the simple lack of knowledge or understanding of something. The cost in that case is simple ignorance.
Modus tollens
A helpful exercise is if you find yourself saying “we want to understand x so that we can y”, try flipping to “we can’t y if we don’t understand x”. This sort of shifts the burden on the reader to provide ways in which we can y without understanding x. You can do this iteratively: come up with _z_s which you can’t do without y, and so on.
Pure vs. applied research
Research is pure when the significance stage of the topic-question-significance frame refers only to knowing, not to doing. Research is applied when the significance step refers to doing. Notice that the question step, even in applied research, refers to knowing or understanding.
Connecting research to practical consequences
You might find that the significance stage is stretching a bit to relate the conceptual understanding gained from the question stage. Sometimes you can modify and add a fourth step to the topic-question-significance frame and make it into topic-conceptual question-conceptual significance-possible practical application. Splitting significance into two helps you draw reasonable, plausible applications. A claimed application is a stretch when it is not plausible. Note: the authors suggest that there is a class of conceptual papers in which you want to save practical implications entirely for the conclusion, that for a certain kind of paper practical applications do not belong in the introduction.
AI safety
One characterisitic of AI safety that makes it difficult both to do and interface with is the chains of “so what” are often very long. The path from deconfusion research to everyone dying or not dying feels like a stretch if not done carefully, and has a lot of steps when done carefully. As I mentioned in my last post, it’s easy to get sucked into the “novel information for it’s own sake” regime at least as a reader. More practical oriented approaches are perhaps those that seek new regimes for how to even train models, and the “so what?” is answered “so we have dramatically less OODR-failures” or something. The condition-costs framework seems really beneficial for articulating alignment agendas and directions.
Misc
“Researchers often begin a project without a clear idea of what the problem even is.”
Look for problems as you read. When you see contradictions, inconsistencies, incomplete explanations tentatively assume that readers would or should feel the same.
Ask not “Can I solve it?” but “will my readers think it ought to be solved?”
“Try to formulate a question you think is worth answering, so that down the road, you’ll know how to find a problem others think is worth solving.”
When it comes to your internal track record, it is often said that finding what you wrote at time t-k beats trying to remember what you thought at t-k. However, the activation energy to keep such a journal is kinda a hurdle (which is why products like https://fatebook.io are so good!).
I find that a nice midpoint between the full and correct internal track record practices (rigorous journaling) and completely winging it (leaving yourself open to mistakes and self delusion) is talking to friends, because I think my memory of conversations that are had out loud with other people is more detailed and honest than my memory of things I’ve thought / used to think, especially when it’s a stressful and treacherous topic.[1]
Going over etiquette and the social contract, perhaps if it’s software specific it talks about minimal reproducers, whatever else the author thinks is involved.
A sketch I’m thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don’t be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.
… If you’re wondering why you just read all that, here’s the juice: often in GSAI position papers there’ll be some reference to expectations that capture “harm” or “safety”. Preexpectations and postexpectations with respect to particular pairs of programs could be a great way to cash this out, cuz we could look at programs as interventions and simulate RCTs (labeling one program control and one treatment) in our world modeling stack. When it comes to harm and safety, Prop and bool are definitely not rich enough.
Does anyone use vim / mouse-minimal browser? I like Tridactyl better than the other one I tried, but it’s not great when there’s a vim mode in a browser window everything starts to step on eachother (like in jupyter, colab, leetcode, codesignal)
Trydactyl is amazing. You can disable the mode on specific websites by running the blacklistadd command. If you have configured that already, these settings can also be saved in your config file. Here’s my config (though careful before copying my config. It has fixamo_quiet enabled, a command that got Tridactyl almost removed when it was enabled by default. You should read what it does before you enable it.)
I’m halfway through how to measure anything: cybersecurity, which doesn’t have a lot of specifics to cybersecurity and mostly reviews the first book. I never finished the first one, and it was about four years ago that I read the parts that I did.
I think for top of the funnel EA recruiting it remains the best and most underrated book. Basically anyone worried about any kind of problem will do better if they read it, and most people in memetically adaptive / commonsensical activist or philanthropic mindsets probably aren’t measuring enough.
However, the material is incredibly basic for someone who’s been hanging out with EAs or on LessWrong for even a little bit. You’ve already absorbed so much of it from the water supply.
What’s different there compared to the first book?
I read the first one and found it to resonate strongly, but also found my mental models to not fit well with the general thrust. Since then I’ve been studying stats and thinking more about measurement with the intent to reread the first book. Curious if the cybersecurity one adds something more though
In terms of the parts where the books overlap, I didn’t notice anything substantial. If anything the sequel is less, cuz there wasn’t enough detail to get into tricks like the equivalent bet test.
We can say “a monotonic map, Φ∈mono(QP) is a phenomenon of P as observed by Q”, then, emergence is simply the impreservation of joins.
Given preorders (P,≤P) and (Q,≤Q), we say a map in mono(QP) “preserves” joins (which, recall, are least upper bounds) iff ∀ab∈P,Φa∨QΦb=Φ(a∨Pb) where by “x=y” we mean x≤y∧y≤x.
Suppose Φ is a measurement taken from a particle. We would like for our measurement system to be robust against emergence, which is literally operationalized by measuring one particle, measuring another, then doing some operation on the two results and getting the exact same thing as you would have gotten if you smashed the particles together somehow before taking the (now, single) measurement. But we don’t always get what we want.
Indeed, for arbitrary preorders and monotone arrows, you can prove Φa∨QΦb≤QΦ(a∨Pb), which we interpret as saying “smashing things together before measuring gives you more information than measuring two things then somehow combining them”.
In the sequences community, emergence is a post-it note that says “you’re confused or uncertain, come back here to finish working later” (Eliezer, 2008 or whatever). In the applied category theory community, emergence is also a failure of understanding but the antidote, namely reductions to composition, is prescribed.
This is all in chapter 1 of seven sketches on compositionality by fong and spivak, citing a thesis by someone called adam.
Two premises of mine are that I’m more ambitious than nearly everyone I meet in meatspace and normal distributions. This implies that in any relationship, I should expect to be the more ambitious one.
I do aspire to be a nagging voice increasing the ambitions of all my friends. I literally break the ice with acquaintances by asking “how’s your master plan going?” because I try to create vibes like we’re having coffee in the hallway of a supervillain conference, and I like to also ask “what harder project is your current project a warmup for?”.
I’m mostly sure I want kids. I told a gf recently (who does not want kids) that if it seemed like someone would be a good coparent, but they made me less ambitious, I would accept the bargain. But what’s implicit premise here?
The premise is of course that in relationships, you drift toward the average of yourself and the other person. Is this plausibly true?
I think there’s a folk wisdom about friendships, which generalizes to romance, that you’re a weighted average of your influences, so you should exercise caution in picking your influences.
Also—autonomy to leave a deadend job and go to EA Hotel was an important part of my ability to cultivate ambition. What price should I put on giving up that autonomy?
However, according to Owain’s comment here, there’s not a super good reason to expect children to decrease ambition. But it’s complicated—that dataset doesn’t express parenting quality.
One comment you could make is “move to the bay and you’ll no longer be the most ambitious person you run into in meatspace”. I’m empirically not someone who needs to be surrounded by like minds in order to thrive, but plausibly like minds could still amplify me. (Separately, I think it’s important for everyone who can afford to not live in the bay to avoid living in the bay, because brain drain and complete absence of cool projects in non-bay cities seem really bad! But I understand that some people simply can’t be ambitious if they’re not getting social rewards for it)
I guess I wonder how best to cultivate ass-kicking, through the kind of automatic cultivation and habituation that comes built in to relationships.
I think 15-20% decrease in ambition is a reasonable price to pay for being a parent. I don’t know if that price is really exacted.
I’m not aware of a literature or a dialogue on what I think is a very crucial divide in longtermism.
In this shortform, I’m going to take a polarity approach. I’m going to bring each pole to it’s extreme, probably each beyond positions that are actually held, because I think median longtermism or the longtermism described in the Precipice is a kind of average of the two.
Negative longtermism is saying “let’s not let some bad stuff happen”, namely extinction. It wants to preserve. If nothing gets better for the poor or the animals or the astronauts, but we dodge extinction and revolution-erasing subextinction events, that’s a win for negative longtermism.
In positive longtermism, such a scenario is considered a loss. From an opportunity cost perspective, the failure to erase suffering or bring to agency and prosperity to 1e1000 comets and planets hurts literally as bad as extinction.
Negative longtermism is a vision of what shouldn’t happen. Positive longtermism is a vision of what should happen.
My model of Ord says we should lean at least 75% toward positive longtermism, but I don’t think he’s an extremist. I’m uncertain if my model of Ord would even subscribe to the formation of this positive and negative axis.
What does this axis mean? I wrote a little about this earlier this year. I think figuring out what projects you’re working on and who you’re teaming up with strongly depends on how you feel about negative vs. positive longtermism. The two dispositions toward myopic coalitions are “do” and “don’t”. I won’t attempt to claim which disposition is more rational or desirable, but explore each branch
When Alice wants future X and Bob wants future Y, but if they don’t defeat the adversary Adam they will be stuck with future 0 (containing great disvalue), Alice and Bob may set aside their differences and choose form a myopic coalition to defeat Adam or not.
Form myopic coalitions. A trivial case where you would expect Alice and Bob to tend toward this disposition is if X and Y are similar. However, if X and Y are very different, Alice and Bob must each believe that defeating Adam completely hinges on their teamwork in order to tend toward this disposition, unless they’re in a high trust situation where they each can credibly signal that they won’t try to get a head start on the X vs.Y battle until 0 is completely ruled out.
Don’t form myopic coalitions. A low trust environment where Alice and Bob each fully expect the other to try to get a head start on X vs.Y during the fight against 0 would tend toward the disposition of not forming myopic coalitions. This could lead to great disvalue if a project against Adam can only work via a team of Alice and Bob.
An example of such a low-trust environment is, if you’ll excuse political compass jargon, reading bottom-lefts online debating internally the merits of working with top-lefts on projects against capitalism. The argument for coalition is that capitalism is a formiddable foe and they could use as much teamwork as possible; the argument against coalition is historical backstabbing and pogroms when top-lefts take power and betray the bottom-lefts.
For a silly example, consider an insurrection against broccoli. The ice cream faction can coalition with the pizzatarians if they do some sort of value trade that builds trust, like the ice cream faction eating some pizza and the pizzatarians eating some ice cream. Indeed, the viciousness of the fight after broccoli is abolished may have nothing to do with the solidarity between the two groups under broccoli’s rule. It may or may not be the case that the ice cream faction and the pizzatarians can come to an agreement about best to increase value in a post-broccoli world. Civil war may follow revolution, or not.
Now, while I don’t support long reflection (TLDR I think a collapse of diversity sufficient to permit a long reflection would be a tremendous failure), I think elements of positive longtermism are crucial for things to improve for the poor or the animals or the astronauts. I think positive longtermism could outperform negative longtermism when it comes to finding synergies between the extinction prevention community and the suffering-focused ethics community. However, I would be very upset if I turned around in a couple years and positive longtermists were, like, the premiere face of longtermism. The reason for this is once you admit positive goals, you have to deal with everybody’s political aesthetics, like a philosophy professor’s preference for a long reflection or an engineer’s preference for moar spaaaace or a conservative’s preference for retvrn to pastorality or a liberal’s preference for intercultural averaging. A negative goal like “don’t kill literally everyone” greatly lacks this problem. Yes, I would change my mind about this if 20% of global defense expenditure was targeted at defending against extinction-level or revolution-erasing events, then the neglectedness calculus would lead us to focus the by comparison smaller EA community on positive longtermism.
The takeaway from this shortform should be that quinn thinks negative longtermism is better for forming projects and teams.
Writers can’t avoid creating some role for themselves and their readers, planned or not
Before considering the role you’re creating for your reader, consider the role you’re creating for yourself. Your broad options are the following
I’ve found some new and interesting information—I have information for you
I’ve found a solution to an important practical problem—I can help you fix a problem
I’ve found an answer to an important question—I can help you understand something better
The authors recommend assuming one of these three. There is of course a wider gap between information and the neighborhood of problems and questions than there is between problems and questions! Later on in chapter four the authors provide a graph illustrating problems and questions: Practical problem -> motivates -> Research question -> defines -> Conceptual/research problem. Information, when provided mostly for novelty, however, is not in this cycle. Information can be leveled at problems or questions, plays a role in providing solutions or answers, but can also be for “its own sake”.
I’m reminded of a paper/post I started but never finished, on providing a poset-like structure to capabilities. I thought it would be useful if you could give a precise ordering on a set of agents, to assign supervising/overseeing responsibilities. Looking back, providing this poset would just be a cool piece of information, effectively: I wasn’t motivated by a question or problem so much as “look at what we can do”. Yes, I can post-hoc think of a question or a problem that the research would address, but that was not my prevailing seed of a reason for starting the project. Is the role of the researcher primarily a writing thing, though, applying mostly to the final draft? Perhaps it’s appropriate for early stages of the research to involve multi-role drifting, even if it’s better for the reader experience if you settle on one role in the end.
Additionally, it occurs to me that maybe “I have information for you” mode just a cheaper version of the question/problem modes. Sometimes I think of something that might lead to cool new information (either a theory or an experiment), and I’m engaged moreso by the potential for novelty than I am by the potential for applications.
I think I’d like to become more problem-driven. To derive possibilities for research from problems, and make sure I’m not just seeking novelty. At the end of the day, I don’t think these roles are “equal” I think the problem-driven role is the best one, the one we should aspire to.
[When you adopt one of these three roles, you must] cast your readers in a complementary role by offering them a social contract: _I’ll play my part if you play yours … if you cast them in a role they won’t accept, you’re likely to lose them entirely… You must report your research in a way that motivates your readers to play the role you have imagined for them.
The three reader roles complementing the three writer roles are
Entertain me
Help me solve my practical problem
Help me understand something better
It’s basically stated that your choice of writer role implies a particular reader role, 1 mapping to 1, 2 mapping to 2, and 3 mapping to 3.
Role 1 speaks to an important difficulty in the x-risk, EA, alignment community; which is how not to get drawn into the phenomenal sensation of insight when something isn’t going to help you on a problem. At my local EA meetup I sometimes worry that the impact of our speaker events is low, because the audience may not meaningfully update even though they’re intellectually engaged. Put another way, intellectual engagement can be goodhartable, the sensation of insight can distract you from your resolve to shatter your bottlenecks and save the world if it becomes an end itself. Should researchers who want to be careful about this avoid the first role entirely? Should the alignment literature look upon the first reader role as a failure mode? We talk about a lot of cool stuff, it can be easy to be drawn in by the cool factor like some of the non-EA rationalists I’ve met at meetups.
I’m not saying reader role number two absolutely must dominate, because it can diverge from deconfusion which is better captured by reader role number three.
Division of labor between reader and writer, writer roles do not always imply exactly one reader role
Isn’t it the case that deconfusion/writer role three research can be disseminated to practical (as opposed to theoretical) -minded people, and then those people turn question-answer into problem-solution? You can write in the question-answer regime, but there may be that (rare) reader who interprets it in the problem-solution regime! This seems to be an extremely good thing that we should find a way to encourage. In general reading the drifts across multiple roles seems like the most engaged kind of reading.
He had become so caught up in building sentences that he had almost forgotten the barbaric days when thinking was like a splash of color landing on a page.
a B-valued quantifier is any function (A→B)→B, so when B is bool quantifiers are the functions that take predicates as input and return bool as output (same for prop). the standard max and min functions on arrays count as real-valued quantifiers for some index set A.
I thought I had seen ∀ as the max of the Prop-valued quantifiers, and exists as the min somewhere, which has a nice mindfeel since forall has this “big” feeling (if you determined for P:A→Prop that ∀P (of which ∀x:A,Px is just syntax sugar since the variable name x is irrelevant) by exhaustive checking, it would cost O(|A|) whereas ∃P would cost O(1) unless the derivation of the witness was dependent on size of domain somehow).
Incidentally however, in differentiable logic it seems forall is the “minimal expectation” and existential is the “maximal expectation”. Page 10 of the LDL paper, where a Emin(g(X)) is the limit as gamma goes to zero of ∫x∈Bγ(ming)p(x)g(x)dx, or the integral with respect to a γ-ball about the min of g rather than about the entire domain of g. os in this sense, the interpretation of a universally quantified prop is a minimal expectation, dual where existentially quantified prop is a maximal expectation.
I didn’t like the way this felt aesthetically, since as I said, forall feels “big” which mood-affiliates toward a max. But that’s notes from barely-remembered category theory I saw once. Anyway, I asked a language model and it said that forall is minimal because it imposes the strictest of “most conservative” requirement. so “max” in the sense of “exists is interpreted to maximal expectation” refers to maximal freedom.
Among monotonic, boolean quantifiers that don’t ignore their input, exists is maximal because it returns true as often as possible; forall is minimal because it returns true as rarely as possible.
consider how our nonconstructive existence proof of nash equilibria creates an algorithmic search problem, which we then study with computational complexity. For example, 2-player 0-sum games are P but for three or more players general sum games are NP-hard. I wonder if every nonconstructive existence proof is like this? In the sense of inducing a computational complexity exercise to find what class it’s in, before coming up with greedy heuristics to accomplish an approximate example in practice.
Quick version of conversations I keep having, might be worth a top level effortpost.
A prediction market platform giving granular permission systems would open up many use cases for many people
whistleblower protections at large firms, dating, project management and internal company politics—all userbases with underserved opinions about transparency. Manifold could pivot to this but have a lot of other stuff they could do instead.
Think about slack admins are confused about how to prevent some usergroups from @channel and discord admins aren’t.
Jargon is not due to status scarcity, but it sometimes makes unearned requests for attention
When you see a new intricate discipline, and you’re reticent to invest in navigating it, asking to be convinced that your attention has been earned is fine, but I don’t recall seeing a valid or interesting complaint about jargon that deviates from this.
Like most wide-scale social phenomena, jargon is shaped by multiple incentives, with a pretty wide variance in the narrowness of consumer (insider, outsider, elite, median) and type of value provided (clarity, obfuscation, reinforcement of values, chunking of concepts).
Undertstanding a field VERY OFTEN requires understanding the people and social structures that shape the field. Jargon is useful in this dimension, as well as the surface-level content of the jargon.
There’s a remarkable TNG episode about enfeeblement and paul-based threatmodels, if I recall correctly.
There’s a post-scarcity planet with some sort of Engine of Prosperity in the townsquare, and it doesn’t require maintenance for enough generations that engineering itself is a lost oral tradition. Then it starts showing signs of wear and tear...
If paul was writing this story, they would die. I think in the actual episode, there’s a disagreeable autistic teenager who expresses curiosity about the Engine mechanisms, and the grownups basically shame him, like “shut up and focus on painting and dancing”. I think the Enterprise crew bails them out by fixing the Engine, and leaving the kid with a lesson about recultivating engineering as a discipline and a sort of intergenerational cultural heritage and responsibility.
I probably saw it over 10 years ago, I haven’t looked it up yet. Man, this is a massive boon to the science-communication elements of threatmodeling, given that the state of public discussion seems to be little middle ground between unemployment and literally everyone literally dying. We can just point people to this episode! Any thoughts?
We need a cool one-word snappy thing to say for “just what do you think you know and how do you think you know it” or like “I’m requesting more background about this belief you’ve stated, if you have time”.
I want something that has the same mouthfeel as “roll to disbelieve” for this.
Would there be a way of estimating how many people within the amazon organization are fanatical about same day delivery ratio against how many are “just working a job”? Does anyone have a guess? My guess is that an organization of that size with a lot of cash only needs about 50 true fanatics, the rest can be “mere employees”. What do yall think?
I can’t really think of any research bearing on this, and unclear how you’d measure it anyway.
One way to go might be to note that there is a wide (and weird) variance between the efficiency of companies: market pressures are slack enough that two companies doing as far as can be told the exact same thing in the same geographic markets with the same inputs might be almost 100% different (I think was the range in the example of concrete manufacturing in one paper I read); a lot of that difference appears to be explainable by the quality of the management, and you can do randomized experiments in management coaching or intensity of management and see substantial changes in the efficiency of a company (Bloom—the other one—has a bunch of studies like this). Presumably you could try to extrapolate from the effects of individuals to company-wide effects, and define the goal of the ‘fanatical’ as something like ‘maintaining top-10% industry-wide performance’: if educating the CEO is worth X percentiles and hiring a good manager is worth 0.0Y percentiles and you have such and such a number of each, then multiply out to figure out what will bump you 40 percentiles from an imagined baseline of 50% to the 90% goal.
Another argument might be a more Fermi estimate style argument from startups. A good startup CEO should be a fanatic about something, otherwise they probably aren’t going to survive the job. So we can assume one fanatic at least. People generally talk about startups beginning to lose the special startup magic of agility, focus, and fanaticism at around Dunbar’s number level of employees like 300, or even less (eg Amazon’s two-pizza rule which is I guess 6 people?). In the ‘worst’ case that the founder has hired 0 fanatics, that implies 1 fanatic can ride herd over no more than ~300 people; in the ‘best’ case that he’s hired dozens, then each fanatic can only cover for more like 2 or 3 non-fanatics. I’m not sure how we should count Amazon’s employees: do the warehouse workers, often temps, really count? They are so micro-managed and driven by the warehouse operation that they hardly seem even relevant to the question. I can’t quickly find that number, just totals, but let’s say there’s like 100,000 non-warehouse-ish employees; at a 300:1 ratio, you’d need 333, and at 3:1, 33,333. The former might be feasible, the latter not so much. (And would explain why Amazon.com seems to be a gradually degrading shopping experience—so many ads! Why are there ads getting in my way when I’m trying to give you my money already, Amazon!)
I’m not sure “fanatical” is well-defined enough to mean anything here. I doubt there are any who’d commit terrorist acts to further same-day delivery. There are probably quite a few who believe it’s important to the business, and a big benefit for many customers.
You’re absolutely right that a lot of employees and contractors can be “mere employees”, not particularly caring about long-term strategy, customer perception, or the like. That’s kind of the nature of ALL organizations and group behaviors, including corporate, government, and social groupings. There’s generally some amount of influencers/selectors/visionaries, some amount of strategists and implementers, and a large number of followers. Most organizations are multidimensional enough that the same people can play different roles on different topics as well.
I don’t think it needs any true fanatics. It just needs incentives.
This isn’t to say there won’t be fanatics anyway. There probably aren’t many things that nobody can get fanatical about. This is even more true if they’re given incentives to act fanatical about it.
I don’t think it needs any true fanatics. It just needs incentives.
Sure, but the incentive structure needs continual maintenance to keep it aligned with or pointing at the goal, which naturally leads to the questions of how many people are needed to keep the structure pointing at the goal, and what the motivation of those people will be.
We need a name for the following heuristic, I think, I think of it as one of those “tribal knowledge” things that gets passed on like an oral tradition without being citeable in the sense of being a part of a literature. If you come up with a name I’ll certainly credit you in a top level post!
I heard it from Abram Demski at AISU′21.
Suppose you’re either going to end up in world A or world B, and you’re uncertain about which one it’s going to be. Suppose you can pull lever LA which will be 100 valuable if you end up in world A, or you can pull lever LB which will be 100 valuable if you end up in world B. The heuristic is that if you pull LA but end up in world B, you do not want to have created disvalue, in other words, your intervention conditional on the belief that you’ll end up in world A should not screw you over in timelines where you end up in world B.
This can be fully mathematized by saying “if most of your probability mass is on ending up in world A, then obviously you’d pick a lever L such that V(L|A) is very high, just also make sure that V(L|B)>=0 or creates an acceptably small amount of disvalue.”, where V(L|A) is read “the value of pulling lever L if you end up in world A”
Why are you specifying 100 or 0 value, and using fuzzy language like “acceptably small” for disvalue?
Is this based on “value” and “disvalue” being different dimensions, and thus incomparable? Wouldn’t you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you’ll find yourself in?
Why are you specifying 100 or 0 value, and using fuzzy language like “acceptably small” for disvalue?
100 and 0 in this context make sense. Or at least in my initial reading: arbitrarily-chosen values that are in a decent range to work quickly with (akin to why people often work in percentages instead of 0..1)
Is this based on “value” and “disvalue” being different dimensions, and thus incomparable?
It is—I’m going to say “often”, although I am aware this is suboptimal phrasing—often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
As such, you can often end up with discontinuities at zero.
Wouldn’t you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you’ll find yourself in?
Dropping the entire probability distribution of outcomes through your utility function doesn’t even necessarily have a closed-form result. In a universe where computation itself is a cost, finding a cheaper heuristic (and working through if said heuristic has any particular basis or problems) can be valuable.
The heuristic in the grandparent comment is just what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
It is often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
This heuristic is what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
I’m not sure I understand. If the lever is +100 in world A and −90 in world B, it seems like a good bet if you don’t know which world you’re in. Or is that what you mean by “acceptably small amount of disvalue”?
Obviously there are considerations downstream of articulating this, one is that when P(A)>P(B) but V(LA|A)<V(LB|B) so it’s reasonable to hedge on ending up in world B even though it’s not strictly more probable than ending up in world A.
I think one of the most crucial meta skills i’ve developed is honing my sense of who’s criticizing me vs. who’s complaining.
A criticism is actionable, implicitly often it’s from someone who wants you to win. A complaint is when you can’t figure out how you’d actionably fix something or improve based on what you’re being told.
This simple binary story is problematic. It can empower you to ignore criticism you don’t like by providing a set of excuses, if you’re not careful. Sometimes it’s operationally impossible to parse out a criticism that runs so deep that it unsettles your premises from a complaint! I think people who are building things can be excused for ignoring advice if the only actionable way of accepting that advice is to completely overhaul their approach, for reasons of focus and other logistical concerns. If it’s that rare time in a project when you are going back to the drawing board and starting over, that’s definitely time to mine complaints for useful insight.
Related: the legend of the amazon customer in the 90s who was insatiably filling out customer feedback forms, to the point where 2000s or 2010s amazon named a boardroom after him. The idea was that this guy helped them improve a lot—surely it would have been easy to dismiss him as a complainer, but they didn’t, they found actionable advice within the complaints. I think your ability to take something that isn’t intended to help you, isn’t actionable on it’s face, and mining it for actionable insight can be very important. But for filtering, for attention, for sanity, dismissing something quickly because it doesn’t seem like it can help you or the project improve can be valid as well.
Disvalue via interpersonal expected value and probability
My deontologist friend just told me that treating people like investments is no way to live. The benefits of living by that take are that your commitments are more binding, you actually do factor out uncertainty, because when you treat people like investments you always think “well someday I’ll no longer be creating value for this person and they’ll drop me from their life”. It’s hard to make long term plans, living like that.
I’ve kept friends around out of loyalty to what we shared 5-10 years ago while questioning an expected value theory or probability theory based value prop. So I’m not, like, super guilty of this or anything. But overall I do take expected value theory and probability theory into interpersonal matters, and I don’t object when others do the same for me. Though it’s hard sometimes, I think it’s basically fine if someone drops me because I’m not adding value for them. An edge case in the opposite direction is that you’re obligated to build deep friendships with every acquaintance, which is also a little silly. But a sweet spot, like a marriage or other way of teaming up (like for a project) might meaningfully call for a suspension of expected value theory and probability theory.
One thing to be careful about in such decisions—you don’t know your own utility function very precisely, and your modeling of both future interactions and your value from such are EXTREMELY lossy.
The best argument for deontological approaches is that you’re running on very corrupt hardware, and rules that have evolved and been tested over a long period of time are far more trustworthy than your ad-hoc analysis which privileges obvious visible artifacts over more subtle (but often more important) considerations.
I may refine this into a formal bounty at some point.
I’m curious if censorship would actually work in the context of blocking deployment of superpowerful AI systems. Sometimes people will mention “matrix multiplication” as a sort of goofy edge case, which isn’t very plausible, but that doesn’t mean there couldn’t be actual political pressure to censor it. A more plausible example would be attention. Say the government threatens soft power against arxiv if they don’t pull attention is all you need, or threatens soft power against harvard if their linguistics department doesn’t pull the pytorch-annotated attention is all you need. By this point, it goes without saying that black hat hackers writing down the equations would face serious consequences if they got caught. Now instead of attention, imagine some more galaxy-brained paper or insight that gets published in 2028 and is an actual missing ingredient to advanced AI (assuming you’re not one of the people who think attention is all you need already is that paper).
While it’s certainly a research project to look at pros and cons of this approach to safety from AI, I think before that we need someone to profile efficacy of technological censorship through history to come at an estimate of how well this would work, i.e., how well it would actually slow or stop the propagation of this information, how well it would slow or stop the deployment of systems based on that information.
My guess at who the ideal person to execute on this bounty would be some patent law nerd, tho I’m sure a variety of types of nerd could do a great job.
any literature on estimates of social impact of businesses divided by their valuations?
the idea that dollars are a proxy for social impact is neat, but leaves a lot of room for goodhart and I think it’s plausible that they diverge entirely in cases. It would be useful to know, if possible to know, what’s going on here.
Why have I heard about Tyson investing into lab grown, but I haven’t heard about big oil investing in renewable?
Tyson’s basic insight here is not to identify as “an animal agriculture company”. Instead, they identify as “a feeding people company”. (Which happens to align with doing the right thing, conveniently!)
It seems like big oil is making a tremendous mistake here. Do you think oil execs go around saying “we’re an oil company”? When they could instead be going around saying “we’re a powering stuff” company. Being a powering stuff company means you have fuel source indifference!
I mean if you look at all the money they had to spend on disinformation and lobbying, isn’t it insultingly obvious to say “just invest that money into renewable research and markets instead”?
Is there dialogue on this? Also, have any members of “big oil” in fact done what I’m suggesting, and I just didn’t hear about it?
It seems like big oil is making a tremendous mistake here. Do you think oil execs go around saying “we’re an oil company”? When they could instead be going around saying “we’re a powering stuff” company. Being a powering stuff company means you have fuel source indifference!
The main problem is that prior investment into the oil method of powering stuff doesn’t translate into having a comparative advantage in a renewable way of powering stuff. They want a return on their existing massive investments.
While this looks superficially like a sunk cost fallacy, it isn’t. If a comparatively small investment (mere billions) can ensure continued returns on their trillions of sunk capital for another decade, it’s worth it to them.
Investment into renewable powering stuff would require substantially different skill sets in employees, in very different locations, and highly non-overlapping investment. At best, such an endeavour would constitute a wholly owned subsidiary that grows while the rest of the company withers. At worst, a parasite that hastens the demise of the parent while eventually failing in the face of competition anyway.
I’ve had a background assumption in my interpretation of and beliefs about reward functions for as long as I can remember (i.e. since first reading the sequences), that I suddenly realized I don’t believe is written down. Over the last two years I’ve gained experience writing coq sufficient to inspire a convenient way of framing it.
Computational vs axiomatic reward functions
Computational vs axiomatic in proof engineering
A proof engineer calls a proposition computational if it’s proof can be broken down into parts.
For example, a + (b + c) = (a + b) + c is computational because you can think of it’s proof as the application of the associativity lemma then the application of something called a “refl”, the fundamental termination of a proof involving equality. Passing around the associativity lemma is in a sense passing around it’s proof, which assuming a is inductive (take nat; zero and successor) is an application of nat’s induction principle, unpacking the recursive definition of +, etc.
In other words, if my adversary asks “why is a + (b + c) = (a + b) + c I can show them; I only have to make sure they agree to the fundamental definitions of nat and + : nat -> nat -> nat, the rest I can compel them to believe.
On the flip side, consider function extensionality, or f = g <-> forall x, f x = g x, not provable because we do not know that the domain of f (which equals the domain of g) is countable, to name but one scenario. Because they can’t prove it, theories “admit function extensionality as an axiom” from time to time.
In other words, if I invoke function extensionality in a proof, and my adversary has agreed to the basic type and function definitions, they remain entitled to reject my proof because if they ask why I believe function extensionality the best I can do is say “because I declared it on line 7”.
We do not call reasoning involving axioms computational. Instead, the discourse has sort of become poisoned by the axiom; it’s verificational properties have become weaker. (Intuitively, I can declare on line 7 anything I want; the risk of proving something that is actually false increases a great deal with each axiom I declare).
Apocryphally, a lecturer recalled a meeting perhaps of the univalent foundations group at IAS, when homotopy type theory (HoTT) was brand new (HoTT is based on something called univalence, which is about reasoning on type equalities in arbitrary “universes” (“kinds” for the haskell programmer)). In HoTT 1.0, univalence relied on an axiom (done carefully of course, to minimize the damage of the poison) and Per Martin-Lof is said to have remarked “it’s not really type theory if there’s an axiom”. HoTT 2.0 called cubical type theory repairs this, which is why cubical tt is sometimes called computational tt.
AIXI-like and AIXI-unlike AGIs
If the space of AGIs can be carved into AIXI-like and AIXI-unlike with respect to goals, clearly AIXI-like architectures have goals imposed on them axiomatically by the programmer. The complement of course is where the reward function is computational; decomposable.
See the NARS literature of Wang et. al. for something at least adjacent to AIXI-unlike—reasoning about NARS emphasizes that reward functions can be computational to an extent, but “bottom out” at atoms eventually. Still, NARS goals are computational to a far greater degree than AIXI-likes.
Conjecture: humans are AIXI-unlike AGIs
This should be trivial: humans can decompose their reward functions in ways richer than “because god said so”.
Relation to mutability???
If the space of AGIs can be carved into AIXI-like and human-like with respect to goals, does the computationality question help me reason about modifying my own reward function? Intuitively, AIXI’s axiomatic goal corresponds to immutability. However, I don’t think there’s a for-free implication that AIXI-unlikes get self-modification for-free. More work needed.
Position of this post in my overall reasoning
In general, my basic understanding that the AGI space can be divided into what I’ve called AIXI-like and AIXI-unlike with respect to how reward functions are reasoned about, and that computationality (anaxiomaticity vs axiomaticity?) is the crucial axis to view, is deeply embedded in my assumptions. Maybe writing it down will make eventually changing my mind about this easier: I’m uncertain just how conventional my belief/understanding is here.
I should be more careful not to imply I think that we have solid specimens of computational reward functions; more that I think it’s a theoretically important region of the space of possible minds, and might factor in idealizations of agency
I come to you with a dollar I want to spend on AI. You can allocate p pennies to go to capabilities and 100-p pennies to go to alignment, but only if you know of a project that realizes that allocation. For example, we might think that GAN research sets p = 98 (providing 2 cents to alignment) while interpretability research sets p = 10 (providing 90 cents to alignment).
Is this remotely useful? This is a really rough model (you might think it’s more of a venn diagram and that this model doesn’t provide a way of reasoning about the double counting problem).
a task: rate research areas, even whole agendas, with such value p. Many people may disagree about my example assignments to GANs and interpretability, or think both of those are too broad.
What are some alternatives to the splitting a dollar intuition?
To say something is capabilities-prone is less to say a dollar has been cleanly split, and more to say that there are some dynamics that sort of tend toward or get pushed toward different directions. Perhaps I want some sort of fluid metaphor instead.
Question your argument as your readers will—thoughts on chapter 10 of Craft of Research
Three predictable disagreements are
There are causes in addition to the one you claim
What about these counterexamples?
I don’t define X as you do, to me X means...
There are roughly two kinds of queries readers will have about your argument
intrinsic soundness—“challenging the clarity of a claim, relevance of reasons, or quality of evidence”
extrinsic soundness—“different ways of framing the problem, evidence you’ve overlooked, or what others have written on the topic.”
The idea is to anticipate, acknowledge, and respond to both kinds of questions. This is the path to making an argument that readers will trust and accept.
Voicing too many hypothetical objections up front can paralyze you. Instead, what you should do before anything else is focus on what you want to say. Give that some structure, some meat, some life. Then, an important exercise is to imagine readers’ responses to it.
I think cleaving these into two highly separated steps is an interesting idea, doing this with intention may be a valuable exercise next time I’m writing something.
View your argument through the eyes of someone who has a stake in a different outcome, someone who wants you to be wrong.
The authors provide some questions about your problem from a possible reader:
Why is your practical/conceptual solution better than others?
Then, they provide some questions about your support from a possible reader.
“I want to see a different kind of evidence” i.e. hard numbers over anecdotes / real people over cold numbers
“It isn’t accurate”
“It isn’t precise enough”
“It isn’t current”
“It isn’t representative”
“It isn’t authoritative”
“You need more evidence”
It builds credibility to play defense: to recognize your own argument’s limitations. It builds even more credibility to play offense: to explore alternatives to your argument and bring them into your reasoning. If you can, you might develop those alternatives in your own imagination, but more likely you’d like to find alternatives in your sources.
Often your readers will be likeyour sources’ authors; sometimes they may even include them.
What is the perfect amount of objections to acknowledge? Acknowledging too many can distract readers from the core of your argument, while acknowledging too few is a signal of laziness or even disrespect. You need to narrow your list of alternatives or objections by subjecting them to the following priorities
plausible charges of weaknesses that you can rebut
alternative lines of argument important in your field
alternative conclusions that readers want to be true
alternative evidence that readers know
important counterexamples that youu have to address.
What if your argument is flawed? The best thing to do is candidly acknowledge the issue and respond that...
the rest of your argument more than balances the flaw
while the flaw is serious, more research will show a way around it
while the flaw makes it impossible to accept your claim fully, your argument offers important insight into the question and suggests what a better answer would need.
It is wise to build up good faith by acknowledging questions you can’t answer. Concessions are often interpreted as positive signals by the reader.
It is important for your responses to acknowledgments to be subordinate to your main point, or else the reader will miss the forest for the trees.
Remember to make an intentional decision about how much credence to give to an objection or alternative. Weaker ones imply weaker credences, imply less effort in your acknowledgment and response.
there’s a gap in my inside view of the problem, part of me thinks that capabilities progress such as out-of-distribution robustness or the 4 tenets described in open problems in cooperative ai is necessary for AI to be transformative, i.e. a prereq of TAI, and another part of me that thinks AI will be xrisky and unstable if it progresses along other aspects but not along the axis of those capabilities.
There’s a geometry here of transformative / not transformative cross product with dangerous not dangerous.
To have an inside view I must be able to adequately navigate between the quadrants with respect to outcomes, interventions, etc.
If something can learn fast enough, then it’s out-of-distribution performance won’t matter as much. (OOD performance will still matter -but it’ll have less to learn where it’s good, and more to learn where it’s not.*)
*Although generalization ability seems like the reason learning matters. So I see why it seems necessary for ‘transformation’.
missed opportunities to build a predictive track record and trump
I was reminiscing about my prediction market failures, the clearest “almost won a lot of mana dollars” (if manifold markets had existed back then) was this executive order. The campaign speeches made it fairly obvious, and I’m still salty about a few idiots telling me “stop being hysterical” when I accused him of being what he’s writing on the tin that he is pre inauguration even though I overall reminisce that being a time when my epistemics were way worse than they are now.
However, there does seem like there needs to be a word for “lack of shock but failed to predict concretely”. We were threatmodeling a ton of crazy stuff back then! So what if you can econo-splain “well if you didn’t predict concretely then you were, by definition, shocked”, the more useful and accurate thing sounds more like “we were worried about various classes of populist atrocities, some of which would look hysterical in hindsight, those which would look hysterical in hindsight crowded out the ability to write detailed executive orders just to win the mana dollars / bayes points / etc.”. Early onsets of a populist swing are so anxiety-inducing and chaotic, I forgive myself for making an at least token attempt at security mindset by thinking about how bad it could get, but I shouldn’t do so too quickly—a post manifold markets populist would give me a great opportunity to take things seriously, put a little of that anxiety to use.
So of course, what is the institutional role of metaculus or manifold in the leadup to january 6 2021, or things in that reference class? Again, “didn’t write down a detailed description of what would happen, but isn’t shocked when it does”. It cost 0 IQ points to observe in the months leading up to the election that the administration would be a sore loser in worlds where they lost. So why is it so subtle to leverage this observation to gain actual mana dollars or metaculus ranking? This seems like an open problem to me.
august 2024 guaranteed safe ai newsletter
in case i forgot last month, here’s a link to july
A wager you say
One proof of concept for the GSAI stack would be a well-understood mechanical engineering domain automated to the next level and certified to boot. How about locks? Needs a model of basic physics, terms in some logic for all the parts and how they compose, and some test harnesses that simulate an adversary. Can you design and manufacture a provably unpickable lock?
Zac Hatfield-Dodds (of hypothesis/pytest and Anthropic, was offered and declined authorship on the GSAI position paper) challenged Ben Goldhaber to a bet after Ben coauthored a post with Steve Omohundro. It seems to resolve in 2026 or 2027, the comment thread should get cleared up once Ben gets back from Burning Man. The arbiter is Raemon from LessWrong.
Zac says you can’t get a provably unpickable lock on this timeline. Zac gave (up to) 10:1 odds, so recall that the bet can be a positive expected value for Ben even if he thinks the event is most likely not going to happen.
For funsies, let’s map out one path of what has to happen for Zac to pay Ben $10k. This is not the canonical path, but it is a path:
Physics to the relevant granularity (question: can human lockpicks leverage sub-newtownian issues?) is conceptually placed into type theory or some calculus. I tried a riemann integral in coq once (way once), so it occurs to me that you need to decide if you want just the functional models (perhaps without computation / with proof irrelevance) in your proof stack or if you want the actual numerical analysis support in there as well.
Good tooling, library support, etc. around that conceptual work (call it mechlib) to provide mechanical engineering primitives
A lock designing toolkit, depending on mechlib, is developed
Someone (e.g. a large language model) is really good at programming in the lock designing toolkit. They come up with a spec L.
You state the problem “forall t : trajectories through our physics simulation, if L(t) == open(L) then t == key(L)”
Then you get to write a nasty gazillion line Lean proof
Manufacture a lock (did I mention that the design toolkit has links to actual manufacturing stacks?)
Bring a bunch to DefCon 2027 and send another to the lockpicking lawyer
Everyone fails. Except Ben and the army of postdocs that $9,999 can buy.
Looks like after the magnificent research engineering in steps 1 and 2, the rest is just showing off and justifying those two steps. Of course, in a world where we have steps 1 and 2 we have a great deal of transformative applications of formal modeling and verification just in reach, and we’ll need a PoC like locks to practice and concretize the workflow.
Cryptography applications tend to have a curse of requiring a lot of work after the security context, permission set, and other requirements are frozen in stone, which means that when the requirements change you have to start over and throw out a bunch of work (epistemic status: why do you think so many defi projects have more whitepapers than users?). The provably unpickable lock has 2 to 10 x that problem– get the granularity wrong in step one, most of your mechlib implementation won’t be salvageable. As the language model iterates on the spec L in step 5, the other language model has to iterate on the proof in step 6, because the new spec will break most of the proof.
Sorry I don’t know any mechanical engineering, Ben, otherwise I’d take some cracks at it. The idea of a logic such that its denotation is a bunch of mechanical engineering primitives seems interesting enough that my “if it was easy to do in less than a year someone would’ve, therefore there must be a moat” heuristic is tingling. Perhaps oddly, the quantum semantics folks (or with HoTT!) seem to have been productive, but I don’t know how much of that is translatable to mechanical engineering.
Reinforcement learning from proof assistant feedback, and yet more monte carlo tree search
DeepSeek’s paper
The steps are pretraining, supervised finetuning, RLPAF (reinforcement learning from proof assistant feedback), and MCTS (monte carlo tree search). RLPAF is not very rich: it’s a zero reward for any bug at all and a one for a happy typechecker. Glad they got that far with just that.
You can use the model at deepseek.com.
Harmonic ships their migration of miniF2F to Lean 4, gets 90% on it, is hiring
From their “one month in” newsletter. “Aristotle”, which has a mysterious methodology since I’ve only seen their marketing copy rather than an arxiv paper, gets 90% on miniF2F 4 when prompted with natural language proofs. It doesn’t look to me like the deepseek or LEGO papers do that? I could be wrong. It’s impressive just to autoformalize natural language proofs, I guess I’m still wrapping my head around how much harder it is (for an LLM) to implement coming up with the proof as well.
Jobs: research engineer and software engineer
Atlas ships their big google doc alluded to in the last newsletter
Worth a read! The GSAI stack is large and varied, and this maps out the different sub-sub-disciplines. From the executive summary:
You could start whole organizations for every row in this table, and I wouldn’t be a part of any org that targets more than a few at once for fear of being unfocused. See the doc for more navigation (see what I did there? Navigating like with an atlas, perhaps? Get it?) of the field’s opportunities.[1]
Efficient shield synthesis via state-space transformation
Shielding is an area of reactive systems and reinforcement learning that marks states as unsafe and synthesizes a kind of guarding layer between the agent and the environment that prevents unsafe actions from being executed in the environment. So in the rejection sampling flavored version, it literally intercepts the unsafe action and tells the agent “we’re not running that, try another action”. One of the limitations in this literature is computational cost, shields are, like environments, state machines plus some frills, and there may simply be too many states. This is the limitation that this paper focuses on.
Besides cost, demanding a lot of domain knowledge is another limitation of shields, so this is an especially welcome development.
Funding opportunities
ARIA jumped right to technical area three (TA3), prototyping the gatekeeper. Deadline October 2nd. Seems geared toward cyber-physical systems folks. In the document:
This is really cool stuff, I hope they find brave and adventurous teams. I had thought gatekeeper prototypes would be in minecraft or mujoco (and asked a funder if they’d support me in doing that), so it’s wild to see them going for actual cyberphysical systems so quickly.
Paper club
Add to your calendar. On September 19th we will read a paper about assume-guarantee contracts with learned components. I’m liable to have made a summary slide deck to kick us off, but if I don’t, we’ll quietly read together for the first 20-30 minutes then discuss. The google meet room in the gcal event by default.
Andrew Dickson’s excellent post
See Limitations on Formal Verification for AI Safety over on LessWrong. I have a lot of agreements, and my disagreements are more a matter of what deserves emphasis than the fundamentals. Overall, I think the Tegmark/Omohundro paper failed to convey a swisscheesey worldview, and sounded too much like “why not just capture alignment properties in ‘specs’ and prove the software ‘correct’?” (i.e. the vibe I was responding to in my very pithy post). However, I think my main reason I’m not using Dickson’s post as a reason to just pivot all my worldview and resulting research is captured in one of Steve’s comments:
Like, a very strong version I almost endorse is “GSAI isn’t about AI at all, it’s about systems coded by extremely powerful developers (which happen to be AIs)”, and ensuring safety, security, and reliability capabilities scale at similar speeds with other kinds of capabilities.
It looks like one can satisfy Dickson just by assuring him that GSAI is a part of a swiss cheese stack, and that no one is messianically promoting One Weird Trick To Solve Alignment. Of course, I do hope that no one is messianically promoting One Weird Trick…
One problem off the top of my head regarding the InterFramework section: Coq and Lean seems the most conceptually straightforward since they have the same underlying calculus, but even there just a little impredicativity or coinduction could lead to extreme headaches. Now you can have a model at some point in the future that steamrolls over these headaches, but then you have a social problem of the broader Lean community not wanting to upstream those changes– various forks diverging fundamentally seems problematic to me, would lead to a lot of duplicated work and missed opportunities for collaboration. I plan to prompt Opus 3.5 with “replicate flocq in lean4” as soon as I get access to the model, but how much more prompting effort will it be to ensure compliance with preexisting abstractions and design patterns, so that it can not only serve my purposes but be accepted by the community? At least there’s no coinduction in flocq, though some of the proofs may rely on set impredicativity for all I know (I haven’t looked at it in a while).
Oh, I liked this one. Mind if I copy it into your shortform (or at least like the first few paragraphs so people can get a taste?)
By all means. Happy for that
Good arguments—notes on Craft of Research chapter 7
Arguments take place in 5 parts.
This can be modeled as a conversation with readers, where the reader prompts the writer to taking the next step on the list.
Claim ought to be supported with reasons. Reasons ought to be based on evidence. Arguments are recursive: a part of an argument is an acknowledgment of an anticipated response, and another argument addresses that response. Finally, when the distance between a claim and a reason grows large, we draw connections with something called warrants.
The logic of warrants proceeds in generalities and instances. A general circumstance predictably leads to a general consequence, and if you have an instance of the circumstance you can infer an instance of the consequence.
Arguing in real life papers is complexified from the 5 steps, because
Claims should be supported by two or more reasons
A writer can anticipate and address numerous responses. As I mentioned, arguments are recursive, especially in the anticipated response stage, but also each reason and warrant can necessitate a subargument.
Thinking about a top-level post on FOMO and research taste
Fear of missing out defined as inability to execute on a project cuz there’s a cooler project if you pivot
but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute
was discussing this with a friend: “yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair”
I’ve developed strategies to beat this FOMO and gain more depth and detail with projects (too recent to see returns yet, but getting there) but I also suspect it was nutritious of me to develop discernment about what projects are valuable or not valuable for various threat models and theories of change (in such a way that being a phd student off of lesswrong wouldn’t have been as good in crucial ways, tho way better in other ways).
but I think the point is you have to turn off this discernment sometimes, unless you want to specialize in telling people why their plans won’t work, which I’m more dubious on the value of than I used to be
Idk maybe this shortform is most of the value of the top level post
A trans woman told me
And another trans woman had told me almost the exact same thing a couple months ago.
My take is that roles have upsides and downsides, and that you’ll do a bad job if you try to say one role is better or worse than another on net or say that a role is more downside than upside. Also, there are versions of “women talk too much” as a stereotype in many subcultures, but I don’t have a good inside view about it.
This may be true, but it might be that she’s incurring a bunch of social penalities she isn’t aware of. Women are less likely to overtly punish, so if she’s spending more time with women that could already explain it. No one yells at you to STFU, but you miss out on party invite you would have gotten if you shared the conversation better.
I suspect men are also more willing to tell other men to STFU than they are to say it to women, but will let someone else speak to that question.
The fact that both roles have advantages and disadvantage doesn’t necessarily prove that neither is better on net. Then again, “better” by what preferences? Lucky are the people whose preferences match the role they were assigned.
To me it seems that women have a greater freedom of self-expression, as long as they are not competitive. Men are treated instrumentally: they are socially allowed to work and to compete against each other, anything else is a waste of energy. For example, it is okay for a man to talk a lot, if he is a politician, manager, salesman, professor, priest… simply, if it is a part of his job. And when he is seducing a woman. Otherwise, he should be silent. Women are expected to chit-chat all the time, but they should never contradict men, or say anything controversial.
one may be net better than the other, I just think the expected error washes out all of one’s reasoning so individuals shouldn’t be confident they’re right.
thoughts on chapter 9 of Craft of Research
We saw previously that claims ought to be supported with reasons, and reasons ought to be based on evidence. Now we will look closer at reasons and evidence.
Reasons must be in a clear, logical order. Atomically, readers need to buy each of your reasons, but compositionally they need to buy your logic. Storyboarding is a useful technique for arranging reasons into a logical order: physical arrangements of index cards, or some DAG-like syntax. Here, you can list evidence you have for each reason or, if you’re speculating, list the kind of evidence you would need.
When storyboarding, you want to read out the top level reasons as a composite entity without looking at the details (evidence), because you want to make sure the high-level logic makes sense.
I think there is a contract between you and the reader. You must agree to cite sources that are plausibly truthful, and your reader must agree to accept that these sources are reliable. A diligent and well-meaning reader can always second-guess whether, for instance, the beureau of subject matter statistics is collecting and reporting data correctly, but at a certain point this violates the social contract. If they’re genuinely curious or concerned, it may fall on them to investigate the source, not on you. The bar you need to meet is that your sources are plausibly trustworthy. The book doesn’t talk much about this contract, so there’s little I can say about what “plausible” means.
Sometimes you have to be extra careful to distinguish reasons from evidence, a
(<claim>, <reason>, <evidence>)
tuple is subject to regress in the latter two components,(A, B, C)
may need to be justified by(B, C, D)
and so on. The example given of this regress is if I told you(american higher education must curb escalating tuition costs, because the price of college is becoming an impediment to the american dream, today a majority of students leave college with a crushing debt burden)
. In the context of this sentence, “a majority of students...” is evidence, but it would be reasonable to ask for more specifics. In principle, any time information is compressed it may be reasonable to ask for more specifics. A new tuple might look like(the price of college is becoming an impediment to the american dream, because today a majority of students leave college with a crushing debt burden, in 2013 nearly 70% of students borrowed money for college with loans averaging $30000...)
. The third component is still compressing information, but it’s not in the contract between you and the reader for the reader to demand the raw spreadsheet, so this second tuple might be a reasonable stopping point of the regress.Sometimes you have to be careful to distinguish evidence from reports of it. Again, because we are necessarily dealing with compressed information, we can’t often point directly to evidence. Even a spreadsheet, rather than summary statistics of it, is a compression of the phenomena in base reality that it tracks.
There is a criteria you want to screen your evidence with respect to.
sufficient
representative
accurate
precise
authoritative
Being honest about the reliability and prospective accuracy of evidence is always a positive signal. Evidence can be either too precise or not precise enough. The women in one or two of Shakespeare’s plays do not represent all his women, they are not representative. Figure out what sorts of authority signals are considered credible in your community, and seek to emulate them.
Sources—notes on Craft of Research chapters 5 and 6
Primary, secondary, and tertiary sources
The distinction between primary and secondary sources comes from 19th century historians, and the idea of tertiary sources came later. The boundaries can be fuzzy, and are certainly dependent on the task at hand.
I want to reason about what these distinctions look like in the alignment community, and whether or not they’re important.
The rest of chapter five is about how to use libraries and information technologies, and evaluating sources for relevance and reliability.
Chapter 6 starts off with the kind of thing you should be looking for while you read
Look for creative agreement
Offer additional support. You can offer new evidence to support a source’s claim.
Confirm unsupported claims. You can prove something that a source only assumes or speculates about.
Apply a claim more widely. You can extend a position.
Look for creative disagreement
Contradictions of kind. A source says something is one kind of thing, but it’s another.
Part-whole contradictions. You can show that a source mistakes how the parts of something are related.
Developmental or historical contradictions. You can show that a source mistakes the origin or development of a topic.
External cause-effect contradictions. You can show that a source mistakes a causal relationship.
Contradictions of perspective. Most contradictions don’t change a conceptual framework, but when you contradict a “standard” view of things, you urge others to think in a new way.
The rest of chapter 6 is a few more notes about what you’re looking for while reading (evidence, reasons), how to take notes, and how to stay organized while doing this.
The alignment community
I think I see the creative agreement modes and the creative disagreement modes floating around in posts. Would it be more helpful if writers decided on one or two of these modes before sitting down to write?
Moreover, what is a primary source in the alignment community? Surely if one is writing about inner alignment, a primary source is the Risks from Learned Optimization paper. But what are Risks’ primary, secondary, tertiary sources? Does it matter?
Now look at Arbital. Arbital started off to be a tertiary source, but articles that seemed more like primary sources started appearing there. I remember distinctively thinking “what’s up with that?” it struck me as awkward for Arbital to change it’s identity like that, but I end up thinking about and citing the articles that seem more like primary sources.
There’s also the problem of stuff in the memeplex not written down is the real “primary” source while the first person who happens to write it down looks like they’re writing a primary source when in fact what they’re doing is really more like writing a secondary or even tertiary source.
Yesterday I quit my job for direct work on epistemic public goods! Day one of direct work trial offer is April 4th, and it’ll take 6 weeks after that to know if I’m a fulltime hire.
I’m turning down
raise to 200k/yr usd
building lots of skills and career capital that would give me immense job security in worlds where investment into one particular blockchain doesn’t go entirely to zero
having fun on the technical challenges
for
confluence of my skillset and a theory of change that could pay huge dividends in the epistemic public goods space
0.35x paycut from my upcoming raise
uncertainty of it being a trial offer.
having fun on the technical challenges
Which I’m flagging in such detail to give you strength if you’re ever reasoning about your risk tolerance and your goals, just remember, “look at what quinn did!”
did anyone draw up an estimate of how much the proportion of code written by LLMs will increase? or even what the proportion is today
How are people mistreated by bellcurves?
I think this is a crucial part of a lot of psychological maladaption and social dysfunction, very salient to EAs. If you’re way more trait xyz than anyone you know for most of your life, your behavior and mindset will be massively effected, and depending on when in life / how much inertia you’ve accumulated by the time you end up in a different room where suddenly you’re average on xyz, you might lose out on a ton of opportunities for growth.
In other words, the concept of “big fish small pond” is deeply insightful and probably underrated.
Some IQ-adjacent idea is sorta the most salient to me, since my brother recently reminded me “quinn is the smartest person I know”, to which I was like, you should meet smarter people? Or I kinda did feel unusually smart before I was an EA, I can only reasonably claim to be average if you condition on EA or something similar. But this post is extremely important in terms of each of the Big 5, “grit”-adjacent things, etc.
For example, when you’re way more trait xyz than anyone around you, you form habits around adjusting for people to underperform relative to you at trait xyz. Sometimes those habits run very deep in your behavior and wordview, and sometimes they can be super ill-tuned (or at least a bit suboptimal) to becoming average. Plus, you develop a lot of “I have to pave my own way” assumptions about growth and leadership. Related to growth, you may cultivate lower standards for yourself than you otherwise might have. Related to leadership, I expect many people in leader roles at small ponds would be more productive, impactful, and happy if they had access to averageness. Pond size means they don’t get that luxury!
There’s a tightly related topic about failure to abolish meatspace / how you might think the internet corrects for this but later realize how much it doesn’t.
So, being a “big fish in a small pond” teaches you habits that become harmful when you later move to a larger pond. But if you don’t move, you can’t grow further.
I think the specific examples are more known that the generalization. For example:
Many people in Mensa are damaged this way. They learned to be the smartest ones, which they signal by solving pointless puzzles, or by talking about “smart topics” (relativity, quantum, etc.) despite the fact that they know almost nothing about these topics. Why did they learn these bad habits? Because this is how you most efficiently signal intelligence to people who are not themselves intelligent. But it fails to impress the intelligent people used to meeting other intelligent people, because they see the puzzles as pointless, they see the smart talk as bullshit if they ever read an introductory textbook on the topic, and will ask you about your work and achievements instead. The useful thing would instead be to learn how to cooperate with other intelligent people on reaching worthy goals.
People who are too smart or too popular at elementary school (or high school) may be quite shocked when they move to a high school (or university) and suddenly their relative superpowers are gone. If they learned to rely on them too much, they may have a problem adapting to normal hard work or normal friendships.
Staying at the same job for too long might have a similar effect. You feel like an expert because you are familiar with all systems in the company. Then at some moment fate makes you change jobs, and suddenly you realize that you know nothing, that the processes and technologies used in your former company were maybe obsolete. But the more you delay changing jobs, the harder it becomes.
I remember reading in a book by László Polgár, father of the famous female chess players, how he wanted his girls to play in the “men’s” chess league since the beginning, because that’s what he wanted them to win. He was afraid that playing in smaller leagues would learn them habits useful only for the smaller leagues. Technically, the “men’s” chess league was open for everyone, but because there were no women among the winners (yet), a separate league only for women was made. Polgár did not want his girls to compete in the league for women, and that offended many people.
From evolutionary perspective, when people lived in small tribes, if you were the best in your tribe, you remained the best in your tribe (maybe until someone younger than you outcompeted you a few years later). So it made sense to adapt to the situation you had. Our society is weirdly organized from this perspective—as an adult, you will be pushed to compete against the best (sometimes literally in the entire world), and yet as a small child you are put into an elementary school with average kids, where you get the wrong expectations of your future environment. A partial antidote to that are various competitions, where you can compete against similarly talented kids from other schools, so even if you are by far the best at your school, you still know there is much to learn.
I think this wasn’t true at the time, at least in Hungary. The oldest sister and their father spent a lot of time fighting this, so it was ~true by the time the youngest sister got really competitive. This might prove the larger point, since the youngest sister also went the farthest.
Uh, good catch! Then I am surprised that they actually succeeded to win this. It would be too easy and possibly very tempting to just say “you broke the rules, disqualified!” Or at least, I would expect a debate to last for a decade, and then it would be too late for the Polgár sisters.
yeah IQ ish things or athletics are the most well-known examples, but I only generalized in the shortform cuz I was looking around at my friends and thinking about more Big Five oriented examples.
Certainly “conscientiousness seems good but I’m exposed to the mistake class of unhelpful navelgazing, so maybe I should be less conscientious” is so much harder to take seriously if you’re in a pond that tends to struggle with low conscientiousness. Or being so low on neuroticism that your redteam/pentest muscles atrophy.
That sounds intriguing. I would like to read an article with many specific (even if fictional) examples.
nonprosaic ai will not be on short timelines
I think a property of my theory of change is that academic and commercial speed is a bottleneck. I recently realized that my mass assignment for timelines synchronized with my mass assignment for the prosaic/nonprosaic axis. The basic idea is that let’s say a radical new paper that blows up and supplants the entire optimization literature gets pushed to the arxiv tomorrow, signaling the start of some paradigm that we would call nonprosaic. The lag time for academics and industry to figure out what’s going on, figure out how to build on that result, for developer ecosystems to form, would all compound to take us outside of what we would call “short timelines”.
How flawed is this reasoning?
The reasoning assumes that ideas are first generated in academia and don’t arise inside of companies. With DeepMind outperforming the academic protein folding community when protein folding isn’t even the main focus of DeepMind I consider it plausible that new approaches arise within a company and get only released publically when they are strong enough to have an effect.
Even if there’s a paper most radical new papers get ignored by most people and it might be that in the beginning only one company takes the idea seriously and doesn’t talk about it publically to keep a competive edge.
That’s totally fair, but I have a wild guess that the pipeline from google brain to google products is pretty nontrivial to traverse, and not wholly unlike the pipeline from arxiv to product.
How short is “short” for you?
Like, AlexNet was 2012, DeepMind patented deep Q learning in 2014, the first TensorFlow release was 2015, the first PyTorch release was 2016, the first TPU was 2016, and by 2019 we had billion-parameter GPT-2 …
So if you say “Short is ≤2 years”, then yeah, I agree. If you say “Short is ≤8 years”, I think I’d disagree, I think 8 years might be plenty for a non-prosaic approach. (I think there are a lot of people for whom AGI in 15-20 years still counts as “short timelines”. Depends on who you’re talking to, I guess.)
I should’ve mentioned in OP but I was lowkey thinking upper bound on “short” would be 10 years.
I think developer ecosystems are incredibly slow (longer than ten years for a new PL to gain penetration, for instance). I guess under a singleton “one company drives TAI on its own” scenario this doesn’t matter, because tooling tailored for a few teams internal to the same company is enough which can move faster than a proper developer ecosystem. But under a CAIS-like scenario there would need to be a mature developer ecosystem, so that there could be competition.
I feel like 7 years from AlexNet to the world of PyTorch, TPUs, tons of ML MOOCs, billion-parameter models, etc. is strong evidence against what you’re saying, right? Or were deep neural nets already a big and hot and active ecosystem even before AlexNet, more than I realize? (I wasn’t paying attention at the time.)
Moreover, even if not all the infrastructure of deep neural nets transfers to a new family of ML algorithms, much of it will. For example, the building up of people and money in ML, the building up of GPU / ASIC servers and the tools to use them, the normalization of the idea that it’s reasonable to invest millions of dollars to train one model and to fab ASICs tailored to a particular ML algorithm, the proliferation of expertise related to parallelization and hardware-acceleration, etc. So if it took 7 years from AlexNet to smooth turnkey industrial-scale deep neural nets and billion-parameter models and zillions of people trained to use them, then I think we can guess <7 years to get from a different family of learning algorithms to the analogous situation. Right? Or where do you disagree?
No you’re right. I think I’m updating toward thinking there’s a region of nonprosaic short-timelines universes. Overall it still seems like that region is relatively much smaller than prosaic short-timelines and nonprosaic long-timelines, though.
Cope isn’t a very useful concept.
For every person who has a bad reason that they catch because you say “sounds like cope”, there are 10x as many people who find their reason actually compelling. Saying “if that was my reason it would be a sign I was in denial of how hard I was coping” or “I don’t think that reason is compelling” isn’t really relevant to the person you’re ostensibly talking to, who’s trying to make the best decisions for the best reasons. Just say you don’t understand why the reason is compelling.
I’m not sure I have experienced a “sounds like cope” reasoning, or at least it doesn’t match to discussions I’ve noted. Is this similar to “people under stress are bad at updating”? Why would you expect them to be better at communicating than they are at reasoning?
Excellence and adequacy
I asked a friend whether I should TA for a codeschool called ${{codeschool}}.
A hidden claim there that I would soak up the pursuit of non-excellence by proximity or osmosis isn’t what’s interesting (though I could see that turning out either way). What’s interesting is the value of non-excellence, which I’ll call adequacy.
${{codeschool}} in this case is effective and impactful at putting butts in seats at companies, and is thereby responsible for some negligible slice of economic growth. It’s students and instructors are plentiful with the virtue of getting things done, do they really need the virtue of high-craftsmanship? The student who reads SICP and TAPL because they’re pursuing mastery over the very nature of computation is strictly less valuable to the economy than the student who reads react tutorials because they’re pursuing some cash.
Obviously, my friend who was telling me this was of the SICP/TAPL type. In software, this is problematic: lisp and type theory will increase your thinking about the nature of computation, but will it increase your thinking about the social problem of steering a team? From an employer’s perspective, it is naive to prefer excellence over adequacy, it is much wiser to saddle the excellent person with the burden of proving that they won’t get bored easily.
Hufflepuffs can go far, and the fuel is adequacy. Enough competence to get it done, any more is egotistical, a sunk cost.
But what if it’s not about industry/markets, what if it’s about the world’s biggest problems? Don’t we want people who are more competent than strictly necessary to be working on them? Maybe, maybe not.
Related: explore/exploit, become great/become useful
For a long time I’ve operated in the excellence mindset: more energy for struggling with textbooks than for exploiting the skills I already have to ship projects and participate in the real world. Thinking it might be good to shift gears and flex my hufflepuff virtues more.
Seems to me that on the market there are very few jobs for the SICP types.
The more meta something is, the less of that is needed. If you can design an interactive website, there are thousands of job opportunities for you, because thousands of companies want an interactive website, and somehow they are willing to pay for reinventing the wheel. If you can design a new programming language and write a compiler for it… well, it seems that world already has too many different programming languages, but sure there is a place for maybe a dozen more. The probability of success is very small even if you are a genius.
The best opportunity for developers who think too meta is probably to design a new library for an already popular programming language, and hope it becomes popular. The question is how exactly you plan to get paid for that.
Probably another problem is that it requires intelligence to recognize intelligence, and it requires expertise to recognize expertise. The SICP type developer seems to most potential employers and most potential colleagues as… just another developer. The company does not see individual output, only team output; it does not matter that your part of code does not contain bugs, if the project as a whole does. You cannot use solutions that are too abstract for your colleagues, or for your managers. Companies value replaceability, because it is less fragile and helps to keep developer salaries lower than they might be otherwise. (In theory, you could have a team full of SICP type developers, which would allow them to work smarter, and yet the company would feel safe. In practice, companies can’t recognize this type and don’t appreciate it, so this is not going to happen.)
Again, probably the best position for a SICP type developer in a company would be to develop some library that the rest of the company would use. That is, a subproject of a limited size that the developer can do alone, so they are not limited in the techniques they use, as long as the API is comprehensible. Ah, but before you are given such opportunity, you usually have to prove yourself in the opposite type of work.
Sometimes I feel like having a university for software developers just makes them overqualified for the market. A vocational school focusing on the current IT hype would probably make most companies more happy. Also the developers, though probably only in short term, before a new hype comes and they face the competition of a new batch of vocational school graduates trained for the new hype. A possible solution for the vocational school would be to also offer retraining courses for their former students, like three or six months to become familiar with the new hype.
I used to think “community builder” was a personality trait I couldn’t switch off, but once I moved to the bay I realized that I was just desperate for serendipity and knew how to take it from 0% to 1%. Since the bay is constantly humming at 70-90% serendipity, I simply lost the urge to contribute.
Benefactors are so over / beneficiaries are so back / etc.
Let
FairBot
be the player that sends an opponent toCooperate
(C
) if it is provable that they cooperate withFairBot
, and sends them toDefect
(D
) otherwise.Let
FairBot_k
be the player that searches for proofs of length<= k
that it’s input cooperates withFairBot_k
, and cooperates if it finds one, returning defect if all the proofs of length<= k
are exhausted without one being valid.Critch writes that “100%” of the time, mathematicians and computer scientists report believing that
FairBot_k(FairBot_k) = D
, owing to the basic vision of a stack overflow exceeding the valuek
(spoiler in the footnote[1] for how it actually shakes out, in what is now a traditional result in open source game theory).I am one of these people who believe that
FairBot_k(FairBot_k) = D
, because I don’t understand Löb, nor do I understand parametric Löb. But I was talking about this on two separate occasions with friends Ben and Stephen, both of whom made the same remark, a remark which I have not seen discussed.The solution set of an equation approach.
One shorter way of writing
FairBot
is thisFB:=a↦a(FB)
because when a lands in {C,D}, ifa(FB)=CthenCelseD collapses to a(FB).
Here, I’m being sloppy about evaluation vs. provability. I’m taking what was originally ”a(FB) is provable” and replacing it with ”a is evaluable at FB”, and assuming decidability so that I can reflect into
bool
for testing in an if. Then I’m actually performing the evaluation.Stepping back, if you can write down
E:FB(a)=a(FB)
and you know that a and FB share a codomain (the moves of the game, in this case {C,D}), then the solution set of this equation SS(E)=coda=codFB. In other words, the equation is consistent at a(FB)=C=FB(a) and consistent at a(FB)=D=FB(a), and there may not be a principled way of choosing one particular item of SS(E) in general. In other words, the proofs of the type FB(a)=a(FB) are not unique.
What the heck is the type-driven story?
I’m guessing there’s some solution to this problem in MIRI’s old haskell repo, but I haven’t been able to find it reading the code yet.
I can’t think of a typey way to justify A:=A→{C,D}:Type. It’s simply nonsensical, or I’m missing something about a curry-howard correspondence with arithmoquining. In other words, agents like
FairBot
that take other agents as input and return moves are a lispy/pythonic warcrime, in terms of type-driven ideology.Questions
Am I confused because of a subtlety distinguishing evaluability and provability?
Am I confused because of some nuance about how recursion really works?
it turns out that Löb’s theorem implies that
FairBot
cooperates withFairBot
, and a proof-length-aware variant of Löb’s theorem implies thatFairBot_k
cooperates withFairBot_k
.It is almost certainly true that setting k=1, Fairbot_1 defects against Fairbot_1 because there are no proofs of cooperation that are 1 bit in length. There can be exceptions: for instance, where Fairbot_1(Fairbot_1) = C is actually an axiom, and represented with a 1-bit string.
It is definitely not true that Fairbot_k cooperates with Fairbot_k for all k and all implementations of Fairbot_k, with or without Löb’s theorem. It is also definitely not true that Fairbot_k defects against Fairbot_k in general. Whether they cooperate or defect depends upon exactly what proof system and encoding they are using.
I think that to get the type of the agent, you need to apply a fixpoint operator. This also happens inside the proof of Löb for constructing a certain self-referential sentence.
(As a breadcrumb, I’ve heard that this is related to the Y combinator.)
I find myself, just as a random guy, deeply impressed at the operational competence of airports and hospitals. Any good books about that sort of thing?
It is pretty impressive that they function as well as they do, but seeing how the sausage is made (at least in hospitals) does detract from it quite substantially. You get to see not only how an enormous number of battle hardened processes prevent a lot of lethal screw-ups, but also how also how sometimes the very same processes cause serious and very occasionally lethal screw-ups.
It doesn’t help that hospitals seem to be universally run with about 90% of the resources they need to function reasonably effectively. This is possibly because there is relentless pressure to cut costs, but if you strip any more out of them then people start to die from obviously preventable failures. So it stabilizes at a point where everything is much more horrible than it could be, but not quite to an obviously lethal extent.
As far as your direct question goes, I don’t have any good books to recommend.
Rats and EAs should help with the sanity levels in other communities
Consider politics. You should take your political preferences/aesthetics, go to the tribes that are based on them, and help them be more sane. In the politics example, everyone’s favorite tribe has failure modes, and it is sort of the responsibility of the clearest-headed members of that tribe to make sure that those failure modes don’t become the dominant force of that tribe.
Speaking for myself, having been deeply in an activist tribe before I was a rat/EA, I regret I wasn’t there to help the value-aligned and clear-headed over the last few years while some of that tribe’s worst pathologies made gains. Now it seems almost too late for them.
Actionably, I want you to
Write for journals, forums, blogospheres, zines outside of rat and EA.
Dump time into tribes that might not be the state of the art in sanity, find the most sane people there, and find ways to support them.
I speak not (well, not entirely) from my cognitive dissonance at having abandoned an aesthetic I still have feelings for. I think
Tribes besides ours are what make up the overall sanity waterline
It’s ok to set aside humility and imposter syndrome and say “I can actionably be a resource of sanity for someone else”, even tho you personally think you have a lot of work to do at getting less wrong yourself. I would say the opposite of the “affix your mask before helping others” comic strip: find synergies between mentoring others in the art and continuing to master the art yourself.
We basically want every tribe to believe true things and think clearly about their values. Yes, I’m obviously concerned that this will lead to some of my fellow rats taking my advice, applying it to a political aesthetic I find barbaric, and helping that political aesthetic win—I think this concern is basically fine because on net I expect more true beliefs and clear thinking about values to make the meaning of winning for each tribe converge on something that isn’t zero-sum.
I should also mention that I expect an externality from this effort to be an increase in the intrarat / intraEA intellectual diversity.
But what if that makes my tribe lose the political battle?
I mean, if rationality actually helped win political fights, by the power of evolution we already would have been all born rational...
1. Evolution does not magically get from A to B instantly.
2. Evolution does not necessarily care about X for many values of X.
This can include: winning political fights, whether or not nukes are built and many other things.
Claims—thoughts on chapter eight of Craft of Research
Broadly, the two kinds of claims are conceptual and practical.
Conceptual claims ask readers not to ask, but to understand. The flavors of conceptual claim are as follows:
Claims of fact or existence
Claims of definition and classification
Claims of cause and consequence
Claims of evaluation or appraisal
There’s essentially one flavor of practical claim
Claims of action or policy.
If you read between the lines, you might notice that a kind of claim of fact or cause/consequence is that a policy works or doesn’t work to bring about some end. In this case, we see that practical claims deal in ought or should. There is a difference, perhaps subtle perhaps not, between “X brings about Y” and “to get Y we ought to X”.
Readers expect a claim to be specific and significant. You can evaluate your claim along these two axes.
To make a claim specific, you can use precise language and explicit logic. Usually, precision comes at the cost of a higher word count. To gain explicitness, use words like “although” and “because”. Note some fields might differ in norms.
You can think of significance of a claim as the quantity it asks readers to change their mind, or I suppose even behavior.
Avoid arrogance.
Two ways of avoiding arrogance are acknowledging limiting conditions and using hedges to limit certainty.
Don’t run aground: there are innumerable caveats that you could think of, so it’s important to limit yourself only to the most relevant ones or the ones that readers would most plausibly think of. Limiting certainty with hedging is given by example of Watson and Crick, publishing what would become a high-impact result, “We wish to suggest … in our opinion … we believe … Some … appear”
It is not obvious how to walk the line between hedging too little and hedging too much.
This may be context-dependent. Different countries probably have different cultural norms. Norms may differ for higher-status and lower-status speakers. Humble speech may impress some people, but others may perceive it as a sign of weakness. Also, is your audience fellow scientists or are you writing a popular science book? (More hedging for the former, less hedging for the latter.)
notes (from a very jr researcher) on alignment training pipeline
Training for alignment research is one part competence (at math, cs, philosophy) and another part having an inside view / gears-level model of the actual problem. Competence can be outsourced to universities and independent study, but inside view / gears-level model of the actual problem requires community support.
A background assumption I’m working with is that training as a longtermist is not always synchronized with legible-to-academia training. It might be the case that jr researchers ought to publication-maximize for a period of time even if it’s at the expense of their training. This does not mean that training as a longtermist is always or even often orthogonal to legible-to-academia training, it can be highly synchronized, but it depends on the occasion.
It’s common to query what relative ratio should be assigned to competence building (textbooks, exercises) vs. understanding the literature (reading papers and alignment forum), but perhaps there is a third category- honing your threat model and theory of change.
I spoke with a sr researcher recently who roughly said that a threat model with a theory of change is almost sufficient for an inside view / gears-level model. I’m working from the theory that honed threat models and your theory of change are important to calculate interventions. See Alice and Bob in Rohin’s faq.
I’ve been trying by doing exercises with a group of peers weekly to hone my inside view / gears-level model of the actual problem. But the sr researcher i spoke to said mentorship trees of 1:1 time, not exercises that jrs can just do independently or in groups, is the only way it can happen. This is troublesome to me, as the bottleneck becomes mentors’ time. I’m not so much worried about the hopefully merit-based process of mentors figuring out who’s worth their time, as I am about the overall throughput. It gets worse though- what if the process is credentialist?
Take a look at the Critch quote from the top of Rohin’s faq:
Is he implicitly saying that he offloads some of the filtering work to admissions people at top schools? Presumably people from non-top schools are also emailing him, but he doesn’t mention them.
I’d like to see a claim that admissions people at top schools are trustworthy. No one has argued this to my knowledge. I think sometimes the movement falls back on status games, unless there is some intrinsic benefit to “top schools” (besides building social power/capital) that everyone is aware of. (Indeed if someone’s argument is that they identified a lever that requires a lot of social power/capital, then they can maybe put that top school on their resume to use, but if the lever is strictly high quality useful research (instead of say steering a federal government) this doesn’t seem to apply).
I don’t think Critch’s saying that the best way to get his attention is through cold emails backed up by credentials. The whole post is about him not using that as a filter to decide who’s worth his time but that people should create good technical writing to get attention.
Critch’s written somewhere that if you can get into UC Berkeley, he’ll automatically allow you to become his student, because getting into UC Berkeley is a good enough filter.
Where did he say that? Given that he’s working at UC Berkeley I would expect him to treat UC Berkeley students preferentially for reasons that aren’t just about UC Berkeley being able to filter.
It’s natural that you can sign up for one of the classes he teaches at UC Berkeley by being a student of UC Berkeley.
Being enrolled into MIT might be just as hard as being enrolled into UC Berkeley but it doesn’t give you the same access to courses taught at UC Berkeley by it’s faculty.
http://acritch.com/ai-berkeley/
and also
Okay, he does speak about using Berkeley as a filter but he doesn’t speak about taking people as his student.
It seems about helping people in UC Berkeley to connect with other people in UC Berkeley.
I’m excited for language model interpretability to teach us about the difference between compilers and simulations of compilers. In the sense that chatgpt and I can both predict what a compiler of a suitably popular programming language will do on some input, what’s going on there---- surely we’re not reimplementing the compiler on our substrate, even in the limit of perfect prediction? Will be an opportunity for a programming language theorist in another year or two of interp progress
Proof cert memo
In the Safeguarded AI programme thesis[1], proof certificates or certifying algorithms are relied upon in the theory of change. Let’s discuss!
From the thesis:
The abstract of citation 33 (McConnell et al)[2]
In this memo, I do an overview by distinguishing a proof cert from a constructive proof, and dive shallowly into the paper. Then I ruminate on how this might apply in AI contexts.
What’s the difference between a proof cert and a constructive proof?
If a cert is just a witness, you might think that proof certs are just proofs from constructive math as distinct from classical math. For example, under some assumptions and caveats (in the calculus/topology setting) a function can be “known” to have a fixed point without us methodically discovering them with any guarantees, while under other assumptions and caveats (namely the lattice theoretic setting) we know there’s a fixed point because we have a guaranteed procedure for computing it. However, they go further: a certifying algorithm produces with each output a cert/witness that the particular output has, in the case of McConnell et al, “not been compromised by a bug”. This isn’t a conceptual leap from constructive math, in which a proof is literally a witness-building algorithm, it looks to me a bit like a memo saying “btw, don’t throw out the witness” along with notes about writing cheap verifiers that do not simply reimplement the input program logic.
One way of looking at a certificate is that it’s dependent on metadata. This may simply be something read off of an execution trace, or some state that logs important parts of the execution trace to construct a witness like in the bipartite test example. In the bipartite test example, all you need is for the algorithm that searches for a two-coloring or an odd cycle, and crucially will return either the two-coloring or the odd cycle as a decoration on it’s boolean output (where “true” means “is bipartite”). Then, a cheap verifier (or checker) is none other than the pre-existing knowledge that two-colorable and bipartite are equivalent, or conversely that the existence of an odd cycle is equivalent to disproving bipartiteness.
Chapter 11 of [3] will in fact discuss certification and verification supporting eachother, which seems to relax the constraint that a cheap verifier doesn’t simply reimplement the certifier logic.
The main difference between a standard proof as in an applied type theory (like lean or coq) and a proof cert is kinda cultural and not fundamental, in that proof certs prefer to emphasize single IO pairs and lean/coq proofs often like to emphasize entire input types. Just don’t get confused on page 34 when it says “testing on all inputs”—it means that a good certifying algorithm is a means to generate exhaustive unit tests, not that the witness predicate actually runs on all inputs at once. It seems like the picking out of only input of interest and not quantifying over the whole input type will be the critical consideration when we discuss making this strategy viable for learned components or neural networks.
Formally:
Skip this if you’re not super compelled by what you know so far about proof certificates, and jump down to the section on learned programs.
Consider algorithms α:X→Y. A precondition ϕ:X→B and a postcondition ψ:X→Y→B form an IO-spec or IO-behavior. An extension of the output set Y⊥=Y∪{⊥} is needed to account for when the precondition is violated. We have a type W of witness descriptions. Finally, we describe the witness predicate with W:X→Y⊥→W→B that says IO pair xy is witnessed by w. So when ψxy is true, w must be a witness to the truth, else a witness to the falsehood. McConnell et al distinguish witness predicates from strong witness predicates, where the former can prove ¬ϕx∨ψxy but the latter knows which disjunct.
A checker for W is an algorithm sending xyw↦Wxyw. It’s desiderata / nice to haves are correctness, runtime linear in input, and simplicity.
I’m not that confident this isn’t covered in the document somewhere, but McConnell et al don’t seem paranoid about false functions. If Alice (sketchy) claims that a y came from α on x, but she lied to Bob and in fact ran α′, there’s no requirement for witness predicates to have any means of finding out. (I have a loose intuition that the proper adversarial adaptation of proof certs would be possible on a restricted algorithm set, but impose a lot of costs). Notice that W is with respect to a fixed x, W′:∀(x:X),Y(X)→W(X)→B would be a different approach! We should highlight that a witness predicate only tells you about one input.
Potential for certificates about learning, learned artefacts, and learned artefacts’ artefacts
The programme thesis continues:
If AIs are developing critical software (like airplanes, cars, and weaponry) and assuring us that it’s legit, introducing proof certs removes trust from the system. The sense in which I think Safeguarded AI primarily wants to use proof certs certainly doesn’t emphasize certs of SGD or architectures, nor does it even emphasize inference-time certs (though I think it’d separately be excited about progress there). Instead, an artefact that is second order removed from the training loop seems to be the target of proof certs, like a software artefact written by a model but it’s runtime is not the inference time of the model. I see proof certs in this context as primarily an auditing/interpretability tool. There’s no reason a model should write code that is human readable, constraining it to write human readable code seems hard, letting the code be uninterpretable but forcing it to generate interpretable artefacts seems pretty good! The part I’m mostly not clear on is why proof certs would be better than anything else in applied type theory (where proofs are just lambdas, which are very decomposable- you can’t read them all at once but they’re easy to pick apart) or model checking (specs are logical formulae, even if reading the proof that an implementation obeys a spec is a little hairy).
It’s important that we’re not required to quantify over a whole type to get some really valuable assurances, and this proof cert framework is amenable to that. On the other hand, you can accomplish the same relaxation in lean or coq, and we can really only debate over ergonomics. I think ultimately, the merits of this approach vs other approaches are going to be decided entirely on the relative ergonomics of tools that barely exist yet.
Overall impression
McConnell et al is 90 pages, and I only spent a few hours with it. But I don’t find it super impressive or compelling. Conceptually, a constructive proof gets me everything I want out of proof certificates. A tremendous value proposition would be to make these certs easier to obtain for real world programs than what coq or lean could provide, but I haven’t seen those codebases yet. There’s more literature I may look at around proof-carrying code (PCC)[4] which is similar ideas, but I’m skeptical that I’ll be terribly compelled since the insight “just decorate the code with the proof” isn’t very subtle or difficult.
Moreover, this literature just seems kinda old I don’t see obvious paths from it to current research.
https://www.aria.org.uk/wp-content/uploads/2024/01/ARIA-Safeguarded-AI-Programme-Thesis-V1.pdf
http://alg.cs.uni-kl.de/en/team/schweitzer/publikationen/docs/CertifyingAlgorithms.pdf
http://alg.cs.uni-kl.de/en/team/schweitzer/publikationen/docs/CertifyingAlgorithms.pdf
https://www.cs.princeton.edu/~appel/papers/pccmodel.pdf
any interest in a REMIX study group? in meatspace in berkeley, a few hours a week. https://github.com/redwoodresearch/remix_public/
Maybe! (I recently started following the ARENA curriculum, but there’s probably a lot of overlap.)
Any tips for getting out of a “rise to the occasion mindset” and into a “sink to your training” mindset?
I’m usually optimizing for getting the most out of my A-game bursts. I want to start optimizing for my baseline habits, instead. I should cover the B-, C-, and Z-game; the A-game will cover itself.
Mathaphorically, “rising to the occasion” is taking a max of a max, whereas “sinking to the level of your habits” looks like a greatest lower bound.
Yall, this is a rant. It will be sloppy.
I’m really tired of high functioning super smart “autism” like ok we all have madeup diagnoses—anyone with a IQ slightly above 90 knows that they can learn the slogans to manipulate gatekeepers to get performance enhancement, and they decide not to if they think theyre performing well enough already. That doesn’t mean “ADHD” describes something in the world. Similarly, there’s this drift of “autism” getting more and more popular. It’s obnoxious because labels and identities are obnoxious, but i only find it repulsive because of the general trend of articulate and charismatic minorities setting agendas that effect the less talkative (and worse off!) fellow minorities https://open.substack.com/pub/jessesingal/p/why-disability-advocates-are-trying?utm_source=share&utm_medium=android&r=5hj2m (I only read up to free tier, but I’ve seen a bunch of this stuff).
Maybe neuroscientists or psychologists have good reasons for this, but “autism” is the most immensely deranged word in the history of categories—what utility is a word that crosses absent minded professors and people who can’t conceptually distinguish a week from a month insofar as you can wordlessly elicit conceptual understanding from them???? If you worked at the dictionary factory and you tried to slip that word in, you’d be fired immediately. So why do psychologists or neuroscientists get away with this???
I’m a computer guy. I’m bad at social queues sometimes. I feel at home in lots of sperg culture. But leave me out of your stolen valor campaign. We’re fine, you guys—we’re smart enough to correct for most of the downsides.
https://www.thefp.com/p/the-autism-surge-lies-conspiracies
For the record, to mods: I waited till after petrov day to answer the poll because my first guess upon receiving a message on petrov day asking me to click something is that I’m being socially engineered. Clicking the next day felt pretty safe.
“EV is measure times value” is a sufficiently load-bearing part of my worldview that if measure and value were correlated or at least one was a function of the other I would be very distressed.
Like in a sense, is John threatening to second-guess hundreds of years of consensus on is-ought?
oh dear
I’m not sure what measure is referring to here.
probability density
messy, jotting down notes:
I saw this thread https://twitter.com/alexschbrt/status/1666114027305725953 which my housemate had been warning me about for years.
failure mode can be understood as trying to aristotle the problem, lack of experimentation
thinking about the nanotech ASI threat model, where it solves nanotech overnight and deploys adversarial proteins in all the bloodstreams of all the lifeforms.
These are sometimes justified by Drexler’s inside view of boundary conditions and physical limits.
But to dodge the aristotle problem, there would have to be an amount of bandwidth of what’s passing between sensors and actuators (which may roughly correspond to the number of
do
applications in pearl)Can you use something like communication complexity https://en.wikipedia.org/wiki/Communication_complexity (between a system and an environment) to think about “lower bound on the number of sensor-actuator actions” mixed with sample complexity (statistical learning theory)
Like ok if you’re simulating all of physics you can aristotle nanotech, for a sufficient definition of “all” that you would run up against realizability problems and cost way more than you actually need to spend.
Like I’m thinking if there’s a kind of complexity theory of pearl (number of
do
applications needed to acquire some kind of “loss”), then you could direct that at something like “nanotech projects” to fermstimate the way AIs might tradeoff between applying aristotlean effort (observation and induction with no experiment) and spending sensor-actuator interactions (with the world).There’s a scenario in the sequences if I recall correctly about which physics an AI infers from 3 frames of a video of an apple falling, and something about how security mindset suggests you shouldn’t expect your information-theoretic calculation that einsteinian physics is impossible to believe from the three frames to actually apply to the AI. Which is a super dumbed down way of opening up this sort of problem space.
Methods, famously, includes the line “I am a descendant of the line of Bacon”, tracing empiricism to either Roger (13th century) or Francis (16th century) (unclear which).
Though a cursory wikiing shows an 11th century figure providing precedents for empiricism! Alhazen or Ibn al-Haytham worked mostly optics apparently but had some meta-level writings about the scientific method itself. I found this shockingly excellent quote
Should we do more to celebrate Alhazen as an early rationalist?
New discord server dedicated to multi-multi delegation research
DM me for invite if you’re at all interested in multipolar scenarios, cooperative AI, ARCHES, social applications & governance, computational social choice, heterogeneous takeoff, etc.
(side note I’m also working on figuring out what unipolar worlds and/or homogeneous takeoff worlds imply for MMD research).
Questions and Problems—thoughts on chapter 4 of Craft of Doing Research
Last time we discussed the difference between information and a question or a problem, and I suggested that the novelty-satisfied mode of information presentation isn’t as good as addressing actual questions or problems. In chapter 3 which I have not typed up thoughts about, A three step procedure is introduced
Topic: “I am studying …”
Question: ”… because I want to find out what/why/how …”
Significance: ”… to help my reader understand …” As we elaborate on the different kinds of problems, we will vary this framework and launch exercises from it.
The basic feedback loop introduced in this chapter relates practical with conceptual problems and relates research questions with research answers.
What should we do vs. what do we know—practical vs conceptual problems
Opposite eachother in the loop are practical problems and conceptual problems. Practical problems are simply those which imply uncertainty over decisions or actions, while conceptual problems are those which only imply uncertainty over understanding. Concretely, your bike chain breaking is a practical problem because you don’t know where to get it fixed, implying that the research task of finding bike shops will reduce your uncertainty about how to fix the bike chain.
Conditions and consequences
The structure of a problem is that it has a condition (or situation) and the (undesirable) consequences of that condition. The consequences-costs model of problems holds both for practical problems and conceptual problems, but comes in slightly different flavors. In the practical problem case, the condition and costs are immediate and observed. However, a chain of “so what?” must be walked.
One person’s cost may be another person’s condition, so when stating the cost you ought to imagine a socratic “so what?” voice, forcing you to articulate more immediate costs until the socratic voice has to really reach in order to say that it’s not a real cost.
The conceptual problem case is where intangibles play in. The condition in that case is always the simple lack of knowledge or understanding of something. The cost in that case is simple ignorance.
Modus tollens
A helpful exercise is if you find yourself saying “we want to understand x so that we can y”, try flipping to “we can’t y if we don’t understand x”. This sort of shifts the burden on the reader to provide ways in which we can y without understanding x. You can do this iteratively: come up with _z_s which you can’t do without y, and so on.
Pure vs. applied research
Research is pure when the significance stage of the topic-question-significance frame refers only to knowing, not to doing. Research is applied when the significance step refers to doing. Notice that the question step, even in applied research, refers to knowing or understanding.
Connecting research to practical consequences
You might find that the significance stage is stretching a bit to relate the conceptual understanding gained from the question stage. Sometimes you can modify and add a fourth step to the topic-question-significance frame and make it into topic-conceptual question-conceptual significance-possible practical application. Splitting significance into two helps you draw reasonable, plausible applications. A claimed application is a stretch when it is not plausible. Note: the authors suggest that there is a class of conceptual papers in which you want to save practical implications entirely for the conclusion, that for a certain kind of paper practical applications do not belong in the introduction.
AI safety
One characterisitic of AI safety that makes it difficult both to do and interface with is the chains of “so what” are often very long. The path from deconfusion research to everyone dying or not dying feels like a stretch if not done carefully, and has a lot of steps when done carefully. As I mentioned in my last post, it’s easy to get sucked into the “novel information for it’s own sake” regime at least as a reader. More practical oriented approaches are perhaps those that seek new regimes for how to even train models, and the “so what?” is answered “so we have dramatically less OODR-failures” or something. The condition-costs framework seems really beneficial for articulating alignment agendas and directions.
Misc
“Researchers often begin a project without a clear idea of what the problem even is.”
Look for problems as you read. When you see contradictions, inconsistencies, incomplete explanations tentatively assume that readers would or should feel the same.
Ask not “Can I solve it?” but “will my readers think it ought to be solved?”
“Try to formulate a question you think is worth answering, so that down the road, you’ll know how to find a problem others think is worth solving.”
talk to friends as a half measure
When it comes to your internal track record, it is often said that finding what you wrote at time t-k beats trying to remember what you thought at t-k. However, the activation energy to keep such a journal is kinda a hurdle (which is why products like https://fatebook.io are so good!).
I find that a nice midpoint between the full and correct internal track record practices (rigorous journaling) and completely winging it (leaving yourself open to mistakes and self delusion) is talking to friends, because I think my memory of conversations that are had out loud with other people is more detailed and honest than my memory of things I’ve thought / used to think, especially when it’s a stressful and treacherous topic.[1]
I may be more socially attuned than average around here(?) so this may not work for people less socially attuned than me
what’s the best essay on asking for advice?
Going over etiquette and the social contract, perhaps if it’s software specific it talks about minimal reproducers, whatever else the author thinks is involved.
A sketch I’m thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don’t be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.
Guaranteed Safe AI paper club meets again this thursday
Event for the paper club: https://calendar.app.google/2a11YNXUFwzHbT3TA
blurb about the paper in last month’s newsletter:
Yoshua Bengio is giving a talk online tomorrow https://lu.ma/4ylbvs75
GSAI paper club is tomorrow (gcal ticket), summary (by me) and discussion of this paper
i’m getting back into composing and arranging. send me rat poems to set to music!
Does anyone use vim / mouse-minimal browser? I like Tridactyl better than the other one I tried, but it’s not great when there’s a vim mode in a browser window everything starts to step on eachother (like in jupyter, colab, leetcode, codesignal)
Trydactyl is amazing. You can disable the mode on specific websites by running the
blacklistadd
command. If you have configured that already, these settings can also be saved in your config file. Here’s my config (though careful before copying my config. It has fixamo_quiet enabled, a command that got Tridactyl almost removed when it was enabled by default. You should read what it does before you enable it.)Here are my ignore settings:
I’m halfway through how to measure anything: cybersecurity, which doesn’t have a lot of specifics to cybersecurity and mostly reviews the first book. I never finished the first one, and it was about four years ago that I read the parts that I did.
I think for top of the funnel EA recruiting it remains the best and most underrated book. Basically anyone worried about any kind of problem will do better if they read it, and most people in memetically adaptive / commonsensical activist or philanthropic mindsets probably aren’t measuring enough.
However, the material is incredibly basic for someone who’s been hanging out with EAs or on LessWrong for even a little bit. You’ve already absorbed so much of it from the water supply.
What’s different there compared to the first book?
I read the first one and found it to resonate strongly, but also found my mental models to not fit well with the general thrust. Since then I’ve been studying stats and thinking more about measurement with the intent to reread the first book. Curious if the cybersecurity one adds something more though
In terms of the parts where the books overlap, I didn’t notice anything substantial. If anything the sequel is less, cuz there wasn’t enough detail to get into tricks like the equivalent bet test.
preorders as the barest vocabulary for emergence
We can say “a monotonic map, Φ∈mono(QP) is a phenomenon of P as observed by Q”, then, emergence is simply the impreservation of joins.
Given preorders (P,≤P) and (Q,≤Q), we say a map in mono(QP) “preserves” joins (which, recall, are least upper bounds) iff ∀ab∈P,Φa∨QΦb=Φ(a∨Pb) where by “x=y” we mean x≤y∧y≤x.
Suppose Φ is a measurement taken from a particle. We would like for our measurement system to be robust against emergence, which is literally operationalized by measuring one particle, measuring another, then doing some operation on the two results and getting the exact same thing as you would have gotten if you smashed the particles together somehow before taking the (now, single) measurement. But we don’t always get what we want.
Indeed, for arbitrary preorders and monotone arrows, you can prove Φa∨QΦb≤QΦ(a∨Pb), which we interpret as saying “smashing things together before measuring gives you more information than measuring two things then somehow combining them”.
In the sequences community, emergence is a post-it note that says “you’re confused or uncertain, come back here to finish working later” (Eliezer, 2008 or whatever). In the applied category theory community, emergence is also a failure of understanding but the antidote, namely reductions to composition, is prescribed.
This is all in chapter 1 of seven sketches on compositionality by fong and spivak, citing a thesis by someone called adam.
Jotted down some notes about the law of mad science on the EA Forum. Looks like some pretty interesting open problems in the global priorities, xrisk strategy space. https://forum.effectivealtruism.org/posts/r5GbSZ7dcb6nbuWch/quinn-s-shortform?commentId=DqSh6ifdXpwHgXnCG
Ambition, romance, kids
Two premises of mine are that I’m more ambitious than nearly everyone I meet in meatspace and normal distributions. This implies that in any relationship, I should expect to be the more ambitious one.
I do aspire to be a nagging voice increasing the ambitions of all my friends. I literally break the ice with acquaintances by asking “how’s your master plan going?” because I try to create vibes like we’re having coffee in the hallway of a supervillain conference, and I like to also ask “what harder project is your current project a warmup for?”.
I’m mostly sure I want kids. I told a gf recently (who does not want kids) that if it seemed like someone would be a good coparent, but they made me less ambitious, I would accept the bargain. But what’s implicit premise here?
The premise is of course that in relationships, you drift toward the average of yourself and the other person. Is this plausibly true?
I think there’s a folk wisdom about friendships, which generalizes to romance, that you’re a weighted average of your influences, so you should exercise caution in picking your influences.
Also—autonomy to leave a deadend job and go to EA Hotel was an important part of my ability to cultivate ambition. What price should I put on giving up that autonomy?
However, according to Owain’s comment here, there’s not a super good reason to expect children to decrease ambition. But it’s complicated—that dataset doesn’t express parenting quality.
One comment you could make is “move to the bay and you’ll no longer be the most ambitious person you run into in meatspace”. I’m empirically not someone who needs to be surrounded by like minds in order to thrive, but plausibly like minds could still amplify me. (Separately, I think it’s important for everyone who can afford to not live in the bay to avoid living in the bay, because brain drain and complete absence of cool projects in non-bay cities seem really bad! But I understand that some people simply can’t be ambitious if they’re not getting social rewards for it)
I guess I wonder how best to cultivate ass-kicking, through the kind of automatic cultivation and habituation that comes built in to relationships.
I think 15-20% decrease in ambition is a reasonable price to pay for being a parent. I don’t know if that price is really exacted.
Positive and negative longtermism
I’m not aware of a literature or a dialogue on what I think is a very crucial divide in longtermism.
In this shortform, I’m going to take a polarity approach. I’m going to bring each pole to it’s extreme, probably each beyond positions that are actually held, because I think median longtermism or the longtermism described in the Precipice is a kind of average of the two.
Negative longtermism is saying “let’s not let some bad stuff happen”, namely extinction. It wants to preserve. If nothing gets better for the poor or the animals or the astronauts, but we dodge extinction and revolution-erasing subextinction events, that’s a win for negative longtermism.
In positive longtermism, such a scenario is considered a loss. From an opportunity cost perspective, the failure to erase suffering or bring to agency and prosperity to
1e1000
comets and planets hurts literally as bad as extinction.Negative longtermism is a vision of what shouldn’t happen. Positive longtermism is a vision of what should happen.
My model of Ord says we should lean at least 75% toward positive longtermism, but I don’t think he’s an extremist. I’m uncertain if my model of Ord would even subscribe to the formation of this positive and negative axis.
What does this axis mean? I wrote a little about this earlier this year. I think figuring out what projects you’re working on and who you’re teaming up with strongly depends on how you feel about negative vs. positive longtermism. The two dispositions toward myopic coalitions are “do” and “don’t”. I won’t attempt to claim which disposition is more rational or desirable, but explore each branch
When Alice wants future
X
and Bob wants futureY
, but if they don’t defeat the adversary Adam they will be stuck with future0
(containing great disvalue), Alice and Bob may set aside their differences and choose form a myopic coalition to defeat Adam or not.Form myopic coalitions. A trivial case where you would expect Alice and Bob to tend toward this disposition is if
X
andY
are similar. However, ifX
andY
are very different, Alice and Bob must each believe that defeating Adam completely hinges on their teamwork in order to tend toward this disposition, unless they’re in a high trust situation where they each can credibly signal that they won’t try to get a head start on theX
vs.Y
battle until0
is completely ruled out.Don’t form myopic coalitions. A low trust environment where Alice and Bob each fully expect the other to try to get a head start on
X
vs.Y
during the fight against0
would tend toward the disposition of not forming myopic coalitions. This could lead to great disvalue if a project against Adam can only work via a team of Alice and Bob.An example of such a low-trust environment is, if you’ll excuse political compass jargon, reading bottom-lefts online debating internally the merits of working with top-lefts on projects against capitalism. The argument for coalition is that capitalism is a formiddable foe and they could use as much teamwork as possible; the argument against coalition is historical backstabbing and pogroms when top-lefts take power and betray the bottom-lefts.
For a silly example, consider an insurrection against broccoli. The ice cream faction can coalition with the pizzatarians if they do some sort of value trade that builds trust, like the ice cream faction eating some pizza and the pizzatarians eating some ice cream. Indeed, the viciousness of the fight after broccoli is abolished may have nothing to do with the solidarity between the two groups under broccoli’s rule. It may or may not be the case that the ice cream faction and the pizzatarians can come to an agreement about best to increase value in a post-broccoli world. Civil war may follow revolution, or not.
Now, while I don’t support long reflection (TLDR I think a collapse of diversity sufficient to permit a long reflection would be a tremendous failure), I think elements of positive longtermism are crucial for things to improve for the poor or the animals or the astronauts. I think positive longtermism could outperform negative longtermism when it comes to finding synergies between the extinction prevention community and the suffering-focused ethics community. However, I would be very upset if I turned around in a couple years and positive longtermists were, like, the premiere face of longtermism. The reason for this is once you admit positive goals, you have to deal with everybody’s political aesthetics, like a philosophy professor’s preference for a long reflection or an engineer’s preference for moar spaaaace or a conservative’s preference for retvrn to pastorality or a liberal’s preference for intercultural averaging. A negative goal like “don’t kill literally everyone” greatly lacks this problem. Yes, I would change my mind about this if 20% of global defense expenditure was targeted at defending against extinction-level or revolution-erasing events, then the neglectedness calculus would lead us to focus the by comparison smaller EA community on positive longtermism.
The takeaway from this shortform should be that quinn thinks negative longtermism is better for forming projects and teams.
The audience models of research—thoughts on Craft of Doing Research chapter 2
Before considering the role you’re creating for your reader, consider the role you’re creating for yourself. Your broad options are the following
I’ve found some new and interesting information—I have information for you
I’ve found a solution to an important practical problem—I can help you fix a problem
I’ve found an answer to an important question—I can help you understand something better
The authors recommend assuming one of these three. There is of course a wider gap between information and the neighborhood of problems and questions than there is between problems and questions! Later on in chapter four the authors provide a graph illustrating problems and questions:
Practical problem -> motivates -> Research question -> defines -> Conceptual/research problem
. Information, when provided mostly for novelty, however, is not in this cycle. Information can be leveled at problems or questions, plays a role in providing solutions or answers, but can also be for “its own sake”.I’m reminded of a paper/post I started but never finished, on providing a poset-like structure to capabilities. I thought it would be useful if you could give a precise ordering on a set of agents, to assign supervising/overseeing responsibilities. Looking back, providing this poset would just be a cool piece of information, effectively: I wasn’t motivated by a question or problem so much as “look at what we can do”. Yes, I can post-hoc think of a question or a problem that the research would address, but that was not my prevailing seed of a reason for starting the project. Is the role of the researcher primarily a writing thing, though, applying mostly to the final draft? Perhaps it’s appropriate for early stages of the research to involve multi-role drifting, even if it’s better for the reader experience if you settle on one role in the end.
Additionally, it occurs to me that maybe “I have information for you” mode just a cheaper version of the question/problem modes. Sometimes I think of something that might lead to cool new information (either a theory or an experiment), and I’m engaged moreso by the potential for novelty than I am by the potential for applications.
I think I’d like to become more problem-driven. To derive possibilities for research from problems, and make sure I’m not just seeking novelty. At the end of the day, I don’t think these roles are “equal” I think the problem-driven role is the best one, the one we should aspire to.
The three reader roles complementing the three writer roles are
Entertain me
Help me solve my practical problem
Help me understand something better
It’s basically stated that your choice of writer role implies a particular reader role, 1 mapping to 1, 2 mapping to 2, and 3 mapping to 3.
Role 1 speaks to an important difficulty in the x-risk, EA, alignment community; which is how not to get drawn into the phenomenal sensation of insight when something isn’t going to help you on a problem. At my local EA meetup I sometimes worry that the impact of our speaker events is low, because the audience may not meaningfully update even though they’re intellectually engaged. Put another way, intellectual engagement can be goodhartable, the sensation of insight can distract you from your resolve to shatter your bottlenecks and save the world if it becomes an end itself. Should researchers who want to be careful about this avoid the first role entirely? Should the alignment literature look upon the first reader role as a failure mode? We talk about a lot of cool stuff, it can be easy to be drawn in by the cool factor like some of the non-EA rationalists I’ve met at meetups.
I’m not saying reader role number two absolutely must dominate, because it can diverge from deconfusion which is better captured by reader role number three.
Division of labor between reader and writer, writer roles do not always imply exactly one reader role
Isn’t it the case that deconfusion/writer role three research can be disseminated to practical (as opposed to theoretical) -minded people, and then those people turn question-answer into problem-solution? You can write in the question-answer regime, but there may be that (rare) reader who interprets it in the problem-solution regime! This seems to be an extremely good thing that we should find a way to encourage. In general reading the drifts across multiple roles seems like the most engaged kind of reading.
Edward St Aubyn
a B-valued quantifier is any function (A→B)→B, so when B is bool quantifiers are the functions that take predicates as input and return bool as output (same for prop). the standard
max
andmin
functions on arrays count as real-valued quantifiers for some index set A.I thought I had seen ∀ as the max of the Prop-valued quantifiers, and exists as the min somewhere, which has a nice mindfeel since forall has this “big” feeling (if you determined for P:A→Prop that ∀P (of which ∀x:A,Px is just syntax sugar since the variable name x is irrelevant) by exhaustive checking, it would cost O(|A|) whereas ∃P would cost O(1) unless the derivation of the witness was dependent on size of domain somehow).
Incidentally however, in differentiable logic it seems forall is the “minimal expectation” and existential is the “maximal expectation”. Page 10 of the LDL paper, where a Emin(g(X)) is the limit as gamma goes to zero of ∫x∈Bγ(min g)p(x)g(x)dx, or the integral with respect to a γ-ball about the min of g rather than about the entire domain of g. os in this sense, the interpretation of a universally quantified prop is a minimal expectation, dual where existentially quantified prop is a maximal expectation.
I didn’t like the way this felt aesthetically, since as I said, forall feels “big” which mood-affiliates toward a max. But that’s notes from barely-remembered category theory I saw once. Anyway, I asked a language model and it said that forall is minimal because it imposes the strictest of “most conservative” requirement. so “max” in the sense of “exists is interpreted to maximal expectation” refers to maximal freedom.
I suppose this is fine.
Among monotonic, boolean quantifiers that don’t ignore their input, exists is maximal because it returns true as often as possible; forall is minimal because it returns true as rarely as possible.
claude and chatgpt are pretty good at ingesting textbooks and papers and making org-drill cards.
here’s my system prompt https://chat.openai.com/g/g-rgeaNP1lO-org-drill-card-creator though i usually tune it a little further per session.
Here are takes on the idea from the anki ecosystem
https://ankiweb.net/shared/info/1915225457
https://ankigpt.help/
I tried a little ankigpt and it was fine, i haven’t tried the direct plugin from ankiweb. I’m opting for org-drill here cuz I really like plaintext.
consider how our nonconstructive existence proof of nash equilibria creates an algorithmic search problem, which we then study with computational complexity. For example, 2-player 0-sum games are P but for three or more players general sum games are NP-hard. I wonder if every nonconstructive existence proof is like this? In the sense of inducing a computational complexity exercise to find what class it’s in, before coming up with greedy heuristics to accomplish an approximate example in practice.
I like thinking about “what it feels like to write computer programs if you’re a transformer”.
Does anyone have a sense of how to benchmark or falsify Nostalgebraist’s post on the subject?
Quick version of conversations I keep having, might be worth a top level effortpost.
A prediction market platform giving granular permission systems would open up many use cases for many people
whistleblower protections at large firms, dating, project management and internal company politics—all userbases with underserved opinions about transparency. Manifold could pivot to this but have a lot of other stuff they could do instead.
Think about slack admins are confused about how to prevent some usergroups from
@channel
and discord admins aren’t.what are your obnoxious price systems for tutoring?
There’s a somewhat niche CS subtopic that a friend wants to learn, I’m really well positioned to teach her. More discussion on the manifold bounty:
10^93 is a fun and underrated number https://en.wikipedia.org/wiki/Transcomputational_problem
I like 10120 more.
Jargon is not due to status scarcity, but it sometimes makes unearned requests for attention
When you see a new intricate discipline, and you’re reticent to invest in navigating it, asking to be convinced that your attention has been earned is fine, but I don’t recall seeing a valid or interesting complaint about jargon that deviates from this.
Some elaboration here
Like most wide-scale social phenomena, jargon is shaped by multiple incentives, with a pretty wide variance in the narrowness of consumer (insider, outsider, elite, median) and type of value provided (clarity, obfuscation, reinforcement of values, chunking of concepts).
Undertstanding a field VERY OFTEN requires understanding the people and social structures that shape the field. Jargon is useful in this dimension, as well as the surface-level content of the jargon.
There’s a remarkable TNG episode about enfeeblement and paul-based threatmodels, if I recall correctly.
There’s a post-scarcity planet with some sort of Engine of Prosperity in the townsquare, and it doesn’t require maintenance for enough generations that engineering itself is a lost oral tradition. Then it starts showing signs of wear and tear...
If paul was writing this story, they would die. I think in the actual episode, there’s a disagreeable autistic teenager who expresses curiosity about the Engine mechanisms, and the grownups basically shame him, like “shut up and focus on painting and dancing”. I think the Enterprise crew bails them out by fixing the Engine, and leaving the kid with a lesson about recultivating engineering as a discipline and a sort of intergenerational cultural heritage and responsibility.
I probably saw it over 10 years ago, I haven’t looked it up yet. Man, this is a massive boon to the science-communication elements of threatmodeling, given that the state of public discussion seems to be little middle ground between unemployment and literally everyone literally dying. We can just point people to this episode! Any thoughts?
We need a cool one-word snappy thing to say for “just what do you think you know and how do you think you know it” or like “I’m requesting more background about this belief you’ve stated, if you have time”.
I want something that has the same mouthfeel as “roll to disbelieve” for this.
Is there an EV monad? I’m inclined to think there is not, because
EV(EV(X))
is a way simpler structure than a “flatmap” analogue.Would there be a way of estimating how many people within the amazon organization are fanatical about same day delivery ratio against how many are “just working a job”? Does anyone have a guess? My guess is that an organization of that size with a lot of cash only needs about 50 true fanatics, the rest can be “mere employees”. What do yall think?
I can’t really think of any research bearing on this, and unclear how you’d measure it anyway.
One way to go might be to note that there is a wide (and weird) variance between the efficiency of companies: market pressures are slack enough that two companies doing as far as can be told the exact same thing in the same geographic markets with the same inputs might be almost 100% different (I think was the range in the example of concrete manufacturing in one paper I read); a lot of that difference appears to be explainable by the quality of the management, and you can do randomized experiments in management coaching or intensity of management and see substantial changes in the efficiency of a company (Bloom—the other one—has a bunch of studies like this). Presumably you could try to extrapolate from the effects of individuals to company-wide effects, and define the goal of the ‘fanatical’ as something like ‘maintaining top-10% industry-wide performance’: if educating the CEO is worth X percentiles and hiring a good manager is worth 0.0Y percentiles and you have such and such a number of each, then multiply out to figure out what will bump you 40 percentiles from an imagined baseline of 50% to the 90% goal.
Another argument might be a more Fermi estimate style argument from startups. A good startup CEO should be a fanatic about something, otherwise they probably aren’t going to survive the job. So we can assume one fanatic at least. People generally talk about startups beginning to lose the special startup magic of agility, focus, and fanaticism at around Dunbar’s number level of employees like 300, or even less (eg Amazon’s two-pizza rule which is I guess 6 people?). In the ‘worst’ case that the founder has hired 0 fanatics, that implies 1 fanatic can ride herd over no more than ~300 people; in the ‘best’ case that he’s hired dozens, then each fanatic can only cover for more like 2 or 3 non-fanatics. I’m not sure how we should count Amazon’s employees: do the warehouse workers, often temps, really count? They are so micro-managed and driven by the warehouse operation that they hardly seem even relevant to the question. I can’t quickly find that number, just totals, but let’s say there’s like 100,000 non-warehouse-ish employees; at a 300:1 ratio, you’d need 333, and at 3:1, 33,333. The former might be feasible, the latter not so much. (And would explain why Amazon.com seems to be a gradually degrading shopping experience—so many ads! Why are there ads getting in my way when I’m trying to give you my money already, Amazon!)
I’m not sure “fanatical” is well-defined enough to mean anything here. I doubt there are any who’d commit terrorist acts to further same-day delivery. There are probably quite a few who believe it’s important to the business, and a big benefit for many customers.
You’re absolutely right that a lot of employees and contractors can be “mere employees”, not particularly caring about long-term strategy, customer perception, or the like. That’s kind of the nature of ALL organizations and group behaviors, including corporate, government, and social groupings. There’s generally some amount of influencers/selectors/visionaries, some amount of strategists and implementers, and a large number of followers. Most organizations are multidimensional enough that the same people can play different roles on different topics as well.
I don’t think it needs any true fanatics. It just needs incentives.
This isn’t to say there won’t be fanatics anyway. There probably aren’t many things that nobody can get fanatical about. This is even more true if they’re given incentives to act fanatical about it.
Sure, but the incentive structure needs continual maintenance to keep it aligned with or pointing at the goal, which naturally leads to the questions of how many people are needed to keep the structure pointing at the goal, and what the motivation of those people will be.
We need a name for the following heuristic, I think, I think of it as one of those “tribal knowledge” things that gets passed on like an oral tradition without being citeable in the sense of being a part of a literature. If you come up with a name I’ll certainly credit you in a top level post!
I heard it from Abram Demski at AISU′21.
Suppose you’re either going to end up in world A or world B, and you’re uncertain about which one it’s going to be. Suppose you can pull lever LA which will be 100 valuable if you end up in world A, or you can pull lever LB which will be 100 valuable if you end up in world B. The heuristic is that if you pull LA but end up in world B, you do not want to have created disvalue, in other words, your intervention conditional on the belief that you’ll end up in world A should not screw you over in timelines where you end up in world B.
This can be fully mathematized by saying “if most of your probability mass is on ending up in world A, then obviously you’d pick a lever L such that V(L|A) is very high, just also make sure that V(L|B)>=0 or creates an acceptably small amount of disvalue.”, where V(L|A) is read “the value of pulling lever L if you end up in world A”
Why are you specifying 100 or 0 value, and using fuzzy language like “acceptably small” for disvalue?
Is this based on “value” and “disvalue” being different dimensions, and thus incomparable? Wouldn’t you just include both in your prediction, and run it through your (best guess of) utility function and pick highest expectation, weighted by your probability estimate of which universe you’ll find yourself in?
100 and 0 in this context make sense. Or at least in my initial reading: arbitrarily-chosen values that are in a decent range to work quickly with (akin to why people often work in percentages instead of 0..1)
It is—I’m going to say “often”, although I am aware this is suboptimal phrasing—often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
As such, you can often end up with discontinuities at zero.
Dropping the entire probability distribution of outcomes through your utility function doesn’t even necessarily have a closed-form result. In a universe where computation itself is a cost, finding a cheaper heuristic (and working through if said heuristic has any particular basis or problems) can be valuable.
The heuristic in the grandparent comment is just what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
It is often the case that you are confident in the sign of an outcome but not the magnitude of the outcome.
This heuristic is what happens if you are simultaneously very confident in the sign of positive results, and have very little confidence in the magnitude of negative results.
I’m not sure I understand. If the lever is +100 in world A and −90 in world B, it seems like a good bet if you don’t know which world you’re in. Or is that what you mean by “acceptably small amount of disvalue”?
Obviously there are considerations downstream of articulating this, one is that when P(A)>P(B) but V(LA|A)<V(LB|B) so it’s reasonable to hedge on ending up in world B even though it’s not strictly more probable than ending up in world A.
critiques and complaints
I think one of the most crucial meta skills i’ve developed is honing my sense of who’s criticizing me vs. who’s complaining.
A criticism is actionable, implicitly often it’s from someone who wants you to win. A complaint is when you can’t figure out how you’d actionably fix something or improve based on what you’re being told.
This simple binary story is problematic. It can empower you to ignore criticism you don’t like by providing a set of excuses, if you’re not careful. Sometimes it’s operationally impossible to parse out a criticism that runs so deep that it unsettles your premises from a complaint! I think people who are building things can be excused for ignoring advice if the only actionable way of accepting that advice is to completely overhaul their approach, for reasons of focus and other logistical concerns. If it’s that rare time in a project when you are going back to the drawing board and starting over, that’s definitely time to mine complaints for useful insight.
Related: the legend of the amazon customer in the 90s who was insatiably filling out customer feedback forms, to the point where 2000s or 2010s amazon named a boardroom after him. The idea was that this guy helped them improve a lot—surely it would have been easy to dismiss him as a complainer, but they didn’t, they found actionable advice within the complaints. I think your ability to take something that isn’t intended to help you, isn’t actionable on it’s face, and mining it for actionable insight can be very important. But for filtering, for attention, for sanity, dismissing something quickly because it doesn’t seem like it can help you or the project improve can be valid as well.
hmu for a haskell job in decentralized finance. Super fun zero knowledge proof stuff, great earning to give opportunity.
Are shelling points the occam’s razor of mechanism design?
Intuitively I think simplicity is a good explanation for a solution being converged upon.
Does anyone have any crisp examples that violate the schelling point—occam’s razor correspondence?
Disvalue via interpersonal expected value and probability
My deontologist friend just told me that treating people like investments is no way to live. The benefits of living by that take are that your commitments are more binding, you actually do factor out uncertainty, because when you treat people like investments you always think “well someday I’ll no longer be creating value for this person and they’ll drop me from their life”. It’s hard to make long term plans, living like that.
I’ve kept friends around out of loyalty to what we shared 5-10 years ago while questioning an expected value theory or probability theory based value prop. So I’m not, like, super guilty of this or anything. But overall I do take expected value theory and probability theory into interpersonal matters, and I don’t object when others do the same for me. Though it’s hard sometimes, I think it’s basically fine if someone drops me because I’m not adding value for them. An edge case in the opposite direction is that you’re obligated to build deep friendships with every acquaintance, which is also a little silly. But a sweet spot, like a marriage or other way of teaming up (like for a project) might meaningfully call for a suspension of expected value theory and probability theory.
One thing to be careful about in such decisions—you don’t know your own utility function very precisely, and your modeling of both future interactions and your value from such are EXTREMELY lossy.
The best argument for deontological approaches is that you’re running on very corrupt hardware, and rules that have evolved and been tested over a long period of time are far more trustworthy than your ad-hoc analysis which privileges obvious visible artifacts over more subtle (but often more important) considerations.
I may refine this into a formal bounty at some point.
I’m curious if censorship would actually work in the context of blocking deployment of superpowerful AI systems. Sometimes people will mention “matrix multiplication” as a sort of goofy edge case, which isn’t very plausible, but that doesn’t mean there couldn’t be actual political pressure to censor it. A more plausible example would be attention. Say the government threatens soft power against arxiv if they don’t pull attention is all you need, or threatens soft power against harvard if their linguistics department doesn’t pull the pytorch-annotated attention is all you need. By this point, it goes without saying that black hat hackers writing down the equations would face serious consequences if they got caught. Now instead of attention, imagine some more galaxy-brained paper or insight that gets published in 2028 and is an actual missing ingredient to advanced AI (assuming you’re not one of the people who think attention is all you need already is that paper).
While it’s certainly a research project to look at pros and cons of this approach to safety from AI, I think before that we need someone to profile efficacy of technological censorship through history to come at an estimate of how well this would work, i.e., how well it would actually slow or stop the propagation of this information, how well it would slow or stop the deployment of systems based on that information.
My guess at who the ideal person to execute on this bounty would be some patent law nerd, tho I’m sure a variety of types of nerd could do a great job.
any literature on estimates of social impact of businesses divided by their valuations?
the idea that dollars are a proxy for social impact is neat, but leaves a lot of room for goodhart and I think it’s plausible that they diverge entirely in cases. It would be useful to know, if possible to know, what’s going on here.
there’s paid tools that estimate this, probably poorly
thinking about this comment
Why have I heard about Tyson investing into lab grown, but I haven’t heard about big oil investing in renewable?
Tyson’s basic insight here is not to identify as “an animal agriculture company”. Instead, they identify as “a feeding people company”. (Which happens to align with doing the right thing, conveniently!)
It seems like big oil is making a tremendous mistake here. Do you think oil execs go around saying “we’re an oil company”? When they could instead be going around saying “we’re a powering stuff” company. Being a powering stuff company means you have fuel source indifference!
I mean if you look at all the money they had to spend on disinformation and lobbying, isn’t it insultingly obvious to say “just invest that money into renewable research and markets instead”?
Is there dialogue on this? Also, have any members of “big oil” in fact done what I’m suggesting, and I just didn’t hear about it?
Gonna cc to ea forum shortform
Yes, this is more about you not hearing about it.
Shell Has A Bigger Clean Energy Plan Than You Think — CleanTechnica Interview
BP Bets Future on Green Energy, but Investors Remain Wary
It seems that Tyson invested 150 million into a fund for new food solutions.
In contrast to that Exxon invested 600 million in algae biofuels back in 2009 and more afterward.
I do vaguely remember hearing of big oil doing that, though perhaps not as much as meat producers do with lab grown meat, try looking into it.
1. Might be a little bit harder in that industry.
2. Are they in charge (of that)? Who chose them?
you’re most likely right about it being harder in the industry!
I don’t think they need permission or an external mandate to do the right thing!
The main problem is that prior investment into the oil method of powering stuff doesn’t translate into having a comparative advantage in a renewable way of powering stuff. They want a return on their existing massive investments.
While this looks superficially like a sunk cost fallacy, it isn’t. If a comparatively small investment (mere billions) can ensure continued returns on their trillions of sunk capital for another decade, it’s worth it to them.
Investment into renewable powering stuff would require substantially different skill sets in employees, in very different locations, and highly non-overlapping investment. At best, such an endeavour would constitute a wholly owned subsidiary that grows while the rest of the company withers. At worst, a parasite that hastens the demise of the parent while eventually failing in the face of competition anyway.
I’ve had a background assumption in my interpretation of and beliefs about reward functions for as long as I can remember (i.e. since first reading the sequences), that I suddenly realized I don’t believe is written down. Over the last two years I’ve gained experience writing coq sufficient to inspire a convenient way of framing it.
Computational vs axiomatic reward functions
Computational vs axiomatic in proof engineering
A proof engineer calls a proposition computational if it’s proof can be broken down into parts.
For example,
a + (b + c) = (a + b) + c
is computational because you can think of it’s proof as the application of the associativity lemma then the application of something called a “refl”, the fundamental termination of a proof involving equality. Passing around the associativity lemma is in a sense passing around it’s proof, which assuminga
is inductive (takenat
; zero and successor) is an application ofnat
’s induction principle, unpacking the recursive definition of+
, etc.In other words, if my adversary asks “why is
a + (b + c) = (a + b) + c
I can show them; I only have to make sure they agree to the fundamental definitions ofnat
and+ : nat -> nat -> nat
, the rest I can compel them to believe.On the flip side, consider function extensionality, or
f = g <-> forall x, f x = g x
, not provable because we do not know that the domain off
(which equals the domain ofg
) is countable, to name but one scenario. Because they can’t prove it, theories “admit function extensionality as an axiom” from time to time.In other words, if I invoke function extensionality in a proof, and my adversary has agreed to the basic type and function definitions, they remain entitled to reject my proof because if they ask why I believe function extensionality the best I can do is say “because I declared it on line 7”.
We do not call reasoning involving axioms computational. Instead, the discourse has sort of become poisoned by the axiom; it’s verificational properties have become weaker. (Intuitively, I can declare on line 7 anything I want; the risk of proving something that is actually false increases a great deal with each axiom I declare).
Apocryphally, a lecturer recalled a meeting perhaps of the univalent foundations group at IAS, when homotopy type theory (HoTT) was brand new (HoTT is based on something called univalence, which is about reasoning on type equalities in arbitrary “universes” (“kinds” for the haskell programmer)). In HoTT 1.0, univalence relied on an axiom (done carefully of course, to minimize the damage of the poison) and Per Martin-Lof is said to have remarked “it’s not really type theory if there’s an axiom”. HoTT 2.0 called cubical type theory repairs this, which is why cubical tt is sometimes called computational tt.
AIXI-like and AIXI-unlike AGIs
If the space of AGIs can be carved into AIXI-like and AIXI-unlike with respect to goals, clearly AIXI-like architectures have goals imposed on them axiomatically by the programmer. The complement of course is where the reward function is computational; decomposable.
See the NARS literature of Wang et. al. for something at least adjacent to AIXI-unlike—reasoning about NARS emphasizes that reward functions can be computational to an extent, but “bottom out” at atoms eventually. Still, NARS goals are computational to a far greater degree than AIXI-likes.
Conjecture: humans are AIXI-unlike AGIs
This should be trivial: humans can decompose their reward functions in ways richer than “because god said so”.
Relation to mutability???
If the space of AGIs can be carved into AIXI-like and human-like with respect to goals, does the computationality question help me reason about modifying my own reward function? Intuitively, AIXI’s axiomatic goal corresponds to immutability. However, I don’t think there’s a for-free implication that AIXI-unlikes get self-modification for-free. More work needed.
Position of this post in my overall reasoning
In general, my basic understanding that the AGI space can be divided into what I’ve called AIXI-like and AIXI-unlike with respect to how reward functions are reasoned about, and that computationality (anaxiomaticity vs axiomaticity?) is the crucial axis to view, is deeply embedded in my assumptions. Maybe writing it down will make eventually changing my mind about this easier: I’m uncertain just how conventional my belief/understanding is here.
I should be more careful not to imply I think that we have solid specimens of computational reward functions; more that I think it’s a theoretically important region of the space of possible minds, and might factor in idealizations of agency
capabilities-prone research.
I come to you with a dollar I want to spend on AI. You can allocate
p
pennies to go to capabilities and100-p
pennies to go to alignment, but only if you know of a project that realizes that allocation. For example, we might think that GAN research setsp = 98
(providing 2 cents to alignment) while interpretability research setsp = 10
(providing 90 cents to alignment).Is this remotely useful? This is a really rough model (you might think it’s more of a venn diagram and that this model doesn’t provide a way of reasoning about the double counting problem).
a task: rate research areas, even whole agendas, with such value
p
. Many people may disagree about my example assignments to GANs and interpretability, or think both of those are too broad.What are some alternatives to the splitting a dollar intuition?
To say something is capabilities-prone is less to say a dollar has been cleanly split, and more to say that there are some dynamics that sort of tend toward or get pushed toward different directions. Perhaps I want some sort of fluid metaphor instead.
Question your argument as your readers will—thoughts on chapter 10 of Craft of Research
Three predictable disagreements are
There are causes in addition to the one you claim
What about these counterexamples?
I don’t define X as you do, to me X means...
There are roughly two kinds of queries readers will have about your argument
intrinsic soundness—“challenging the clarity of a claim, relevance of reasons, or quality of evidence”
extrinsic soundness—“different ways of framing the problem, evidence you’ve overlooked, or what others have written on the topic.” The idea is to anticipate, acknowledge, and respond to both kinds of questions. This is the path to making an argument that readers will trust and accept.
Voicing too many hypothetical objections up front can paralyze you. Instead, what you should do before anything else is focus on what you want to say. Give that some structure, some meat, some life. Then, an important exercise is to imagine readers’ responses to it.
I think cleaving these into two highly separated steps is an interesting idea, doing this with intention may be a valuable exercise next time I’m writing something.
The authors provide some questions about your problem from a possible reader:
Why do you think there’s a problem at all?
Have you properly defined the problem?
Is your solution practical or conceptual?
Have you stated your claim too strongly?
Why is your practical/conceptual solution better than others?
Then, they provide some questions about your support from a possible reader.
“I want to see a different kind of evidence” i.e. hard numbers over anecdotes / real people over cold numbers
“It isn’t accurate”
“It isn’t precise enough”
“It isn’t current”
“It isn’t representative”
“It isn’t authoritative”
“You need more evidence”
It builds credibility to play defense: to recognize your own argument’s limitations. It builds even more credibility to play offense: to explore alternatives to your argument and bring them into your reasoning. If you can, you might develop those alternatives in your own imagination, but more likely you’d like to find alternatives in your sources.
What is the perfect amount of objections to acknowledge? Acknowledging too many can distract readers from the core of your argument, while acknowledging too few is a signal of laziness or even disrespect. You need to narrow your list of alternatives or objections by subjecting them to the following priorities
It is wise to build up good faith by acknowledging questions you can’t answer. Concessions are often interpreted as positive signals by the reader.
It is important for your responses to acknowledgments to be subordinate to your main point, or else the reader will miss the forest for the trees.
Remember to make an intentional decision about how much credence to give to an objection or alternative. Weaker ones imply weaker credences, imply less effort in your acknowledgment and response.
there’s a gap in my inside view of the problem, part of me thinks that capabilities progress such as out-of-distribution robustness or the 4 tenets described in open problems in cooperative ai is necessary for AI to be transformative, i.e. a prereq of TAI, and another part of me that thinks AI will be xrisky and unstable if it progresses along other aspects but not along the axis of those capabilities.
There’s a geometry here of transformative / not transformative cross product with dangerous not dangerous.
To have an inside view I must be able to adequately navigate between the quadrants with respect to outcomes, interventions, etc.
If something can learn fast enough, then it’s out-of-distribution performance won’t matter as much. (OOD performance will still matter -but it’ll have less to learn where it’s good, and more to learn where it’s not.*)
*Although generalization ability seems like the reason learning matters. So I see why it seems necessary for ‘transformation’.
testing latex in spoiler tag
Testing code block in spoiler tag
:::what about this:::
:::hm?
x :: Bool -> Int -> String
:::::: latex Ax+1:={} :::
missed opportunities to build a predictive track record and trump
I was reminiscing about my prediction market failures, the clearest “almost won a lot of mana dollars” (if manifold markets had existed back then) was this executive order. The campaign speeches made it fairly obvious, and I’m still salty about a few idiots telling me “stop being hysterical” when I accused him of being what he’s writing on the tin that he is pre inauguration even though I overall reminisce that being a time when my epistemics were way worse than they are now.
However, there does seem like there needs to be a word for “lack of shock but failed to predict concretely”. We were threatmodeling a ton of crazy stuff back then! So what if you can econo-splain “well if you didn’t predict concretely then you were, by definition, shocked”, the more useful and accurate thing sounds more like “we were worried about various classes of populist atrocities, some of which would look hysterical in hindsight, those which would look hysterical in hindsight crowded out the ability to write detailed executive orders just to win the mana dollars / bayes points / etc.”. Early onsets of a populist swing are so anxiety-inducing and chaotic, I forgive myself for making an at least token attempt at security mindset by thinking about how bad it could get, but I shouldn’t do so too quickly—a post manifold markets populist would give me a great opportunity to take things seriously, put a little of that anxiety to use.
So of course, what is the institutional role of metaculus or manifold in the leadup to january 6 2021, or things in that reference class? Again, “didn’t write down a detailed description of what would happen, but isn’t shocked when it does”. It cost 0 IQ points to observe in the months leading up to the election that the administration would be a sore loser in worlds where they lost. So why is it so subtle to leverage this observation to gain actual mana dollars or metaculus ranking? This seems like an open problem to me.