On the other hand, the current community believes that getting AI x-safety right is the most important research question of all time. Most people would not publish something just for their career advancement, if it meant sucking oxygen from more promising research directions.
This might be a mitigating factor for my comment above. I am curious about what happened research fields which had “change/save the world’ vibes. Was environmental science immune to similar issues?
Daniel Paleka
because LW/AF do not have established standards of rigor like ML, they end up operating more like a less-functional social science field, where (I’ve heard) trends, personality, and celebrity play an outsized role in determining which research is valorized by the field.
In addition, the AI x-safety field is now rapidly expanding.
There is a huge amount of status to be collected by publishing quickly and claiming large contributions.
In the absence of rigor and metrics, the incentives are towards:
- setting new research directions, and inventing new cool terminology;
- using mathematics in a way that impresses, but is too low-level to yield a useful claim;
- and vice versa, relying too much on complex philosophical insights without empirical work;
- getting approval from alignment research insiders.
See also the now ancient Troubling Trends in Machine Learning Scholarship.
I expect the LW/AF community microcosm will soon reproduce many of of those failures.
I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to “will first dangerous models look like current models”, which I think matters more for research directions than what you allow in the second paragraph.
For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.
The context window will still be much smaller than human; that is, single-run performance on summarization of full-length books will be much lower than of <=1e4 token essays, no matter the inherent complexity of the text.
Braver prediction, weak confidence: there will be no straightforward method to use multiple runs to effectively extend the context window in the three months after the release of GPT-4.
I am eager to see how the mentioned topics connect in the end—this is like the first few chapters in a book, reading the backstories of the characters which are yet to meet.
On the interpretability side—I’m curious how you do causal mediation analysis on anything resembling “values”? The ROME paper framework shows where the model recalls “properties of an object” in the computation graph, but it’s a long way from that to editing out reward proxies from the model.
They test on the basic (Poziom podstawowy) Matura tier for testing on math problems.
In countries with Matura-based education, the basic tier math test is not usually taken by mathematically inclined students—it is just the law that anyone going to a public university has to pass some sort of math exam beforehand. Students who want to study anything where mathematics skills are needed would take the higher tier (Poziom rozszezony).
Can someone from Poland confirm this?
A quick estimate of the percentage of high-school students taking the Polish Matura exams is 50%-75%, though. If the number of students taking the higher tier is not too large, then average performance on the basic tier corresponds to essentially average human-level performance on this kind of test.
Note that many students taking the basic math exam only want to pass and not necessarily perform well; and some of the bottom half of the 270k students are taking the exam for the second or third time after failing before.
My SERI MATS Application
I do not think the ratio of the “AI solves hardest problem” and “AI has Gold” probabilities is right here. Paul was at the IMO in 2008, but he might have forgotten some details...
(My qualifications here: high IMO Silver in 2016, but more importantly I was a Jury member on the Romanian Master of Mathematics recently. The RMM is considered the harder version of the IMO, and shares a good part of the Problem Selection Committee with it.)
The IMO Jury does not consider “bashability” of problems as a decision factor, in the regime where the bashing would take good contestants more than a few hours. But for a dedicated bashing program, it makes no difference.
It is extremely likely that an “AI” solving most IMO geometry problems is possible today—the main difficulty being converting the text into an algebraic statement. Given that, polynomial system solvers should easily tackle such problems.
Say the order of the problems is (Day 1: CNG, Day 2: GAC). The geometry solver gives you 14 points. For a chance on IMO Gold, you have to solve the easiest combinatorics problem, plus one of either algebra or number theory.
Given the recent progress on coding problems as in AlphaCode, I place over 50% probability on IMO #1/#4 combinatorics problems being solvable by 2024. If that turns out to be true, then the “AI has Gold” event becomes “AI solves a medium N or a medium A problem, or both if contestants find them easy”.Now, as noted elsewhere in the thread, there are various types of N and A problems that we might consider “easy” for an AI. Several IMOs in the last ten years contain those.
In 2015, the easiest five problems consisted out of: two bashable G problems (#3, #4), an easy C (#1), a diophantine equation N (#2) and a functional equation A (#5). Given such a problemset, a dedicated AI might be able to score 35 points, without having capabilities remotely enough to tackle the combinatorics #6.
The only way the Gold probability could be comparable to “hardest problem” probability is if the bet only takes general problem-solving models into account. Otherwise, inductive bias one could build into such a model (e.g. resort to a dedicated diophantine equation solver) helps much more in one than in the other.- Mar 14, 2023, 2:10 AM; 34 points) 's comment on A concrete bet offer to those with short AGI timelines by (
Before someone points this out: Non-disclosure-by-default is a negative incentive for the academic side, if they care about publication metrics.
It is not a negative incentive for Conjecture in such an arrangement, at least not in an obvious way.
Do you ever plan on collaborating with researchers in academia, like DeepMind and Google Brain often do? What would make you accept or seek such external collaboration?
I am very sorry that you feel this way. I think it is completely fine for you, or anyone else, to have internal conflicts about your career or purpose. I hope you find a solution to your troubles in the following months.
Moreover, I think you did an useful thing, raising awareness about some important points:
“The amount of funding in 2022 exceeded the total cost of useful funding opportunities in 2022.”
“Being used to do everything in Berkeley, on a high budget, is strongly suboptimal in case of sudden funding constraints.”
“Why don’t we spend less money and donate the rest?”
Epistemic status for what follows: medium-high for the factual claims, low for the claims about potential bad optics. It might be that I’m worrying about nothing here.
However, I do not think this place should be welcoming of posts displaying bad rhetoric and epistemic practices.
Posts like this can hurt hurt the optics of the research done in the LW/AF extended universe. What does a prospective AI x-safety researcher think when they get referred to this site and see this post above several alignment research posts?EDIT: The above paragraph was off. See Ben’s excellent reply for a better explanation of why anyone should care.
I think this place should be careful about maintaining:
the epistemic standard of talking about falsifiable things;
the accepted rhetoric being fundamentally honest and straightforward, and always asking “compared to what?” before making claims;
the aversion to present uncertainties as facts.
For some examples:
I tried for 15 minutes to find a good faith reading of this, but I could not.
Most people would read this as “the hotel room costs $500 and the EA-adjacent community bought the hotel complex in which that hotel is a part of”, while being written in a way that only insinuates and does not commit to meaning exactly that. Insinuating bad optics facts while maintaining plausible deniability, without checking the facts, is a horrible practice, usually employed by politicians and journalists.
The poster does not deliberately lie, but this is not enough when making a “very bad optics” statement that sounds like this one. At any point, they could have asked for the actual price of the hotel room, or about the condition of the actual hotel that might be bought.
This is true. But it is not much different from working a normal software job. The worst thing that can happen is getting fired after not delivering for several months. Some people survive years coasting until there is a layoff round.
An important counterfactual for a lot of people reading this is a PhD degree.
There is no punishment for failing to produce good research, except getting dropping out of the program after a few years.
This might be true. Again, I think it would be useful to ask: what is the counterfactual?
All of this is applicable for anyone that starts working for Google or Facebook, if they were poor beforehand.
This feeling (regretting saving and not spending money) is incredibly common in all people that have good careers.
I would suggest going through the post with a cold head and removing parts which are not up to the standards.
Again, I am very sorry that you feel like this.