Zygi Straznickas

Karma: 61

Previously: math for AI and AI for math

Now just kinda trying to figure out if AI can be made safe

Fluent dreaming for language models (AI interpretability method)

tbenthompson, mikes and Zygi Straznickas

6 Feb 2024 6:02 UTC

45 points

5 comments1 min readLW link

(arxiv.org)

Zygi Straznickas 15 Jan 2024 23:36 UTC
6 points
4
in reply to: Roko’s comment on: Gender Exploration
Whether it’s a big deal depends on the person, but one objective piece of evidence is male plastic surgery statistics: looking at US plastic surgery statistics for year 2022, surgery for gynecomastia (overdevelopment of breasts) is the most popular surgery for men: 23k surgeries per year total, 50% of total male body surgeries and 30% of total male cosmetic surgeries. So it seems that not having breasts is likely quite important for a man’s body image.

(note that this ignores base rates, with more effort one could maybe compare the ratios of [prevalence vs number of surgeries] for other indications, but that’s very hard to quantify for other surgeries like tummy tuck—what’s the prevalence of needing a tummy tuck?)

Zygi Straznickas 6 May 2023 17:57 UTC
1 point
0
in reply to: paulfchristiano’s comment on: Prizes for matrix completion problems
Thanks for the clarification! I assumed the question was using the bit computation model (biased by my own experience), in which case the complexity of SDP in the general case still seems to be unknown (https://math.stackexchange.com/questions/2619096/can-every-semidefinite-program-be-solved-in-polynomial-time)

Zygi Straznickas 6 May 2023 16:55 UTC
7 points
2
in reply to: Zygi Straznickas’s comment on: Prizes for matrix completion problems
Nope, I’m withdrawing this answer. I looked closer into the proof and I think it’s only meaningful asymptotically, in a low rank regime. The technique doesn’t work for analyzing full-rank completions.

Zygi Straznickas 6 May 2023 2:44 UTC
5 points
−3
on: Prizes for matrix completion problems
Question 1 is very likely NP-hard. https://arxiv.org/abs/1203.6602 shows:
We consider the decision problem asking whether a partial rational symmetric matrix with an all-ones diagonal can be completed to a full positive semidefinite matrix of rank at most k. We show that this problem is $\NP$-hard for any fixed integer k≥2.
I’m still stretching my graph-theory muscles to see if this technically holds for $k = \infty$ (and so precisely implies the NP-hardness of Question 1.) But even without that, for applied use cases we can just fix $k$ to a very large number to see that practical algorithms are unlikely.

Zygi Straznickas 6 Apr 2023 20:56 UTC
1 point
0
in reply to: DragonGod’s comment on: Is “Recursive Self-Improvement” Relevant in the Deep Learning Paradigm?
Thanks for clarifying. Yeah, I agree the argument is mathematically correct, but it kinda doesn’t seem to apply to historic cases of intelligence increase that we have:
- Human intelligence is a drastic jump from primate intelligence but this didn’t require a drastic jump in “compute resources”, and took comparably little time in evolutionary terms.
- In human history, our “effective intelligence”—capability of making decisions with the use of man-made tools—grows at an increasing rate, not decreasing
I’m still thinking about how best to reconcile this with the asymptotics. I think the other comments are right in that we’re still at the stage where improving the constants is very viable.

Zygi Straznickas 6 Apr 2023 8:07 UTC
4 points
1
on: Is “Recursive Self-Improvement” Relevant in the Deep Learning Paradigm?
This is a solid argument inasmuch as we define RSI to be about self-modifying its own weights/other-inscrutable-reasoning-atoms. That does seem to be quite hard given our current understanding.
But there are tons of opportunities for an agent to improve its own reasoning capacity otherwise. At a very basic level, the agent can do at least two other things:
1. Make itself faster and more energy efficient—in the DL paradigm, techniques like quantization, distillation and pruning seem to be very effective when used by humans and keep improving, so it’s likely an AGI would improve them further.
2. Invent computational tools: wrt
Most problems in computer science have superlinear time complexity
on one hand sure, improving this is (likely) impossible in the limit because of fundamental complexity properties. On the other hand, the agent can still become vastly smarter than humans. A particular example: the human mind, without any assistance, is very bad at solving 3SAT. But we’ve invented computers, and then constraint solvers, and now are able to solve 3SAT much much faster, even though 3SAT is (likely) exponentially hard. So the RSI argument here is, the smarter (or faster) the model is, the more special-purpose tools it can create to efficiently solve specific problems and thus upgrade its reasoning ability. Not to infinity, but likely far beyond humans.

Zygi Straznickas 26 Jan 2023 4:01 UTC
3 points
0
on: Can we efficiently distinguish different mechanisms?
In particular, right now I don’t have even a single example of a function f such that (i) there are two clearly distinct mechanisms that can lead to f(x) = 1, (ii) there is no known efficient discriminator for distinguishing those mechanisms.
I think I might have an example! (even though it’s not 100% satisfying—there’s some logical inversions needed to fit the format. But I still think it fits the spirit of the question.)
It’s in the setting of polynomial identity testing: let $F$ be a finite field and $p (x)$ a polynomial over that field, and assume we have access to the representation of $p$ (i.e. not just black-box access).
Two questions we can ask about $p$ are:
- EZE (Evaluates to Zero Everywhere): does $p$ evaluate to zero on all inputs? That is, is it the case that $\forall x, p (x) = 0$ ? Unfortunately, that’s coNP-hard.
- PIT (Polynomial Identity Testing) -- decide if $p$ is equal to the zero polynomial, that is, if all of $p$ ’s coefficients are 0. (Note that these are two distinct concepts: in the two-value field $F_{2}$ we have $x^{2} - x = 0$ for all $x$ , even though the polynomial is not zero). Polynomial identity testing can be done efficiently in probabilistic polynomial time.
Now, make $f$ be the inverted Polynomial Identity Testing algorithm: $f (p) = 0$ if $p$ is zero, and $1$ if it’s not zero. Then, assume our perspective is that EZE is the “fundamental” property. In that case $f (p)$ can be $1$ for two reasons: either the polynomial $p$ doesn’t Evaluate to Zero Everywhere, or it’s one of those rare special cases mentioned above where it Evaluates to Zero Everywhere but isn’t identically zero. These to me seem like clearly distinct mechanisms. But to distinguish between them we need to check if $p$ Evaluates to Zero Everywhere, and that’s co-NP hard, so an efficient discriminator (likely) doesn’t exist.

Zygi Straznickas

Fluent dream­ing for lan­guage mod­els (AI in­ter­pretabil­ity method)

Fluent dreaming for language models (AI interpretability method)