Daniel Paleka

Karma: 274

Daniel Paleka Aug 23, 2022, 9:50 PM
LW: 23 AF: 12
11
AF
on: AGI Timelines Are Mostly Not Strategically Relevant To Alignment
I think the timelines (as in, <10 years vs 10-30 years) are very correlated with the answer to “will first dangerous models look like current models”, which I think matters more for research directions than what you allow in the second paragraph.

For example, interpretability in transformers might completely fail on some other architectures, for reasons that have nothing to do with deception. The only insight from the 2022 Anthropic interpretability papers I see having a chance of generalizing to non-transformers is the superposition hypothesis / SoLU discussion.

Daniel Paleka Aug 21, 2022, 6:11 PM
5 points
3
on: What’s the Least Impressive Thing GPT-4 Won’t be Able to Do
The context window will still be much smaller than human; that is, single-run performance on summarization of full-length books will be much lower than of <=1e4 token essays, no matter the inherent complexity of the text.

Braver prediction, weak confidence: there will be no straightforward method to use multiple runs to effectively extend the context window in the three months after the release of GPT-4.

Daniel Paleka Aug 9, 2022, 7:18 AM
2 points
1
on: Team Shard Status Report
I am eager to see how the mentioned topics connect in the end—this is like the first few chapters in a book, reading the backstories of the characters which are yet to meet.
On the interpretability side—I’m curious how you do causal mediation analysis on anything resembling “values”? The ROME paper framework shows where the model recalls “properties of an object” in the computation graph, but it’s a long way from that to editing out reward proxies from the model.

Daniel Paleka Jul 1, 2022, 1:20 PM
7 points
0
in reply to: fiso64’s comment on: [Linkpost] Solving Quantitative Reasoning Problems with Language Models
They test on the basic (Poziom podstawowy) Matura tier for testing on math problems.
In countries with Matura-based education, the basic tier math test is not usually taken by mathematically inclined students—it is just the law that anyone going to a public university has to pass some sort of math exam beforehand. Students who want to study anything where mathematics skills are needed would take the higher tier (Poziom rozszezony).
Can someone from Poland confirm this?

A quick estimate of the percentage of high-school students taking the Polish Matura exams is 50%-75%, though. If the number of students taking the higher tier is not too large, then average performance on the basic tier corresponds to essentially average human-level performance on this kind of test.

Note that many students taking the basic math exam only want to pass and not necessarily perform well; and some of the bottom half of the 270k students are taking the exam for the second or third time after failing before.

Daniel Paleka Apr 10, 2022, 8:28 PM
6 points
on: IMO challenge bet with Eliezer
I do not think the ratio of the “AI solves hardest problem” and “AI has Gold” probabilities is right here. Paul was at the IMO in 2008, but he might have forgotten some details...
(My qualifications here: high IMO Silver in 2016, but more importantly I was a Jury member on the Romanian Master of Mathematics recently. The RMM is considered the harder version of the IMO, and shares a good part of the Problem Selection Committee with it.)
The IMO Jury does not consider “bashability” of problems as a decision factor, in the regime where the bashing would take good contestants more than a few hours. But for a dedicated bashing program, it makes no difference.

It is extremely likely that an “AI” solving most IMO geometry problems is possible today—the main difficulty being converting the text into an algebraic statement. Given that, polynomial system solvers should easily tackle such problems.
Say the order of the problems is (Day 1: CNG, Day 2: GAC). The geometry solver gives you 14 points. For a chance on IMO Gold, you have to solve the easiest combinatorics problem, plus one of either algebra or number theory.

Given the recent progress on coding problems as in AlphaCode, I place over 50% probability on IMO #1/#4 combinatorics problems being solvable by 2024. If that turns out to be true, then the “AI has Gold” event becomes “AI solves a medium N or a medium A problem, or both if contestants find them easy”.
Now, as noted elsewhere in the thread, there are various types of N and A problems that we might consider “easy” for an AI. Several IMOs in the last ten years contain those.
In 2015, the easiest five problems consisted out of: two bashable G problems (#3, #4), an easy C (#1), a diophantine equation N (#2) and a functional equation A (#5). Given such a problemset, a dedicated AI might be able to score 35 points, without having capabilities remotely enough to tackle the combinatorics #6.

The only way the Gold probability could be comparable to “hardest problem” probability is if the bet only takes general problem-solving models into account. Otherwise, inductive bias one could build into such a model (e.g. resort to a dedicated diophantine equation solver) helps much more in one than in the other.
What links here?
- Matthew Barnett's comment on A concrete bet offer to those with short AGI timelines by Matthew Barnett (Mar 14, 2023, 2:10 AM; 34 points)

Daniel Paleka Apr 9, 2022, 9:22 PM
1 point
in reply to: Daniel Paleka’s comment on: AMA Conjecture, A New Alignment Startup
Before someone points this out: Non-disclosure-by-default is a negative incentive for the academic side, if they care about publication metrics.
It is not a negative incentive for Conjecture in such an arrangement, at least not in an obvious way.

Daniel Paleka Apr 9, 2022, 9:21 PM
3 points
on: AMA Conjecture, A New Alignment Startup
Do you ever plan on collaborating with researchers in academia, like DeepMind and Google Brain often do? What would make you accept or seek such external collaboration?