Zbetna Fvapynver [rot13]
Grue_Slinky
What are we assuming about utility functions?
To be clear I unendorsed the idea about a minute after posting because it felt like more of a low-effort shitpost than a constructive idea for understanding the world (and I don’t want to make that a norm on shortform). That said I had in mind that you’re describing the thing to someone who you can’t communicate with beforehand, except there’s common knowledge that you’re forbidden any nouns besides “cake”. In practice I feel like it degenerates to putting all the meaning on adjectives to construct the nouns you’d want to use. E.g. your own “speaking cake” to denote a person, “flat, vertical, compartmentalizing cakes” to denote walls. Of course you’d have to ban any “-like” and “-esque” constructions and similar things, but it’s not clear to me if the boundaries there are too fuzzy to make a good rule set.
Actually, maybe this could be a board game similar to charades. You get a random word such as “elephant”, and you write down a description of it with this constraint. Then the description is gradually read off, and your team tries to guess the word based on the description. It’s inverse to charades in that the reading is monotonous and w/o body language (and could even be done by the other team).
K-complexity: The minimum description length of something (relative to some fixed description language)
Cake-complexity: The minimum description length of something, where the only noun you can use is “cake”
I often hear about deepfakes—pictures/videos that can be entirely synthesized by a deep learning model and made to look real—and how this could greatly amplify the “fake news” phenomenon and really undermine the ability of the public to actually evaluate evidence.
And this sounds like a well-founded worry, but then I was just thinking, what about Photoshop? That’s existed for over a decade, and for all that time it’s been possible to doctor images to look real. So why should deepfakes be any scarier?
Part of it could be that we can fake videos, not just images, but that can’t be all of it.
I suspect the main reason is that in the future, deepfakes will also be able to fool experts. This does seem like an important threshold.
This raises another question: is it, in fact, impossible to fool experts with Photoshop? Are there fundamental limitations on it that prevent it from being this potent, and this was always understood so people weren’t particularly fearful of it? (FWIW when I learned about Photoshop as a kid I freaked out with Orwellian visions even worse than people have with deepfakes now, and pretty much only relaxed out of conformity. I remain ignorant about the technical details of Photoshop and its capabilities)
But even if deepfakes are bound to cross this threshold (not that it’s a fine line) in a way Photoshop never could, aren’t there also plenty of things which experts have had and do have trouble classifying as real/fake? Wikipedia’s list of hoaxes is extensive, albeit most of those fooled the public rather than experts. But I feel like there are plenty of hoaxes that lasted hundreds of years before being debunked (Shroud of Turin, or maybe fake fossils?).
I guess we’re just used to seeing less hoaxes in modern times. Like, in the past hoaxes abounded, and there often weren’t the proper experts around to debunk them, so probably those times warranted a greater degree of epistemic learned helplessness or something. But since the last century, our forgery-spotting techniques have gotten a lot better while the corresponding forgeries just haven’t kept up, so we just happen to live in a time where the “offense” is relatively weaker than the “defense”, but there’s no particular reason it should stay that way.
I’m really not sure how worried I should be about deepfakes, but having just thought through all that, it does seem like the existence of “evidence” in political discourse is not an all-or-nothing phenomenon. Images/videos will likely come to be trusted less, maybe other things as well if deep learning contributes in other ways to the “offense” more than the “defense”. And maybe things will reach a not-so-much-worse equilibrium. Or maybe not, but the deepfake phenomenon certainly does not seem completely new.
Grue_Slinky’s Shortform
Is this open thread not going to be a monthly thing?
FWIW I liked reading the comment threads here, and would be inclined to participate in the future. But that’s just my opinion. I’m curious if more senior people had reasons for not liking the idea?
The Quantity/Quality of Researchers Has Drastically Increased Over the Centuries
I’m not asking about the Fermi paradox, and its unclear to me how that’s related. I’m wondering why we think general (i.e. human-level) intelligence is possible in our universe, if we’re not allowed to invoke anthropic evidence. For instance, here’s some possible ways one can answer my question [rot13′d to avoid spoiling people’s answers]:
1. Nethr gung aba-cevzngr navzny vagryyvtrapr nyernql trgf hf “zbfg bs gur jnl gurer”, naq tvira gur nccebcevngr raivebazrag, vg fubhyq or cbffvoyr va cevapvcyr sbe n fhpprffvba bs navzny fcrpvrf gb ribyir gur erznvavat pncnovyvgvrf.
2. [Rnfl Zbqr] Nethr gung qrrc yrneavat nyernql trgf hf “zbfg bs gur jnl gurer”, naq vg fubhyq or cbffvoyr, jvgu rabhtu genvavat qngn naq gur nccebcevngr nytbevguzvp gjrnxf, gb trg n qrrc yrneavat nytbevguz gb qb trareny-checbfr ernfbavat engure rssvpvragyl.
3. [Uneq Zbqr] Nethr gung uvtu-yriry zngurzngvpny pbafgehpgf, fhpu nf havirefny Ghevat znpuvarf (be creuncf zber pbaivapvatyl, NVKV), fubj gung yrneavat va irel trareny raivebazragf vf cbffvoyr jvgu hayvzvgrq pbzchgr. Gura nethr gung zhpu bs guvf pna or nccebkvzngrq ol srnfvoyr (r.t. cbylabzvny gvzr) nytbevguzf gung pna eha va erny-gvzr ba n oenva/pbzchgre zhpu fznyyre guna n cynarg.I’m unsure what information you need about what the “you” in this counterfactual is? Beyond “alien from a different universe general-purpose reasoning algorithms are different enough those of Earth-based animals that they can’t infer anything about the potential of Earth-based biological intelligence”, I’d be unable to give the details of those algorithms (and it shouldn’t matter anyways?).
[Question] Non-anthropically, what makes us think human-level intelligence is possible?
Huh, that’s a good point. Whereas it seems probably inevitable that AI research would’ve eventually converged on something similar to the current D(R)L paradigm, we can imagine a lot of different ways AI safety could have looked like instead right now. Which makes sense, since the latter is still young and in a kind of pre-paradigmatic philosophical stage, with little unambiguous feedback to dictate how things should unfold (and it’s far from clear when substantially more of this feedback will show up).
I can imagine an alternate timeline where the initial core ideas/impetus for AI safety didn’t come from Yudkowsky/LW, but from e.g. a) Bostrom/FHI b) Stuart Russell or c) some near-term ML safety researchers whose thinking gradually evolved as they thought about longer and longer timescales. And it’s interesting to ask what the current field would consequently look like:
Agent Foundations/Embedded Agency probably (?) wouldn’t be a thing, or at least it would might take some time for the underlying questions which motivate it to be asked in writing, let alone the actual questions within those agendas (or something close to them)
For (c) primarily, its unclear if the alignment problem would’ve been zeroed in on as the “central challenge”, or how long this would take (note: I don’t actually know that much about near-term concerns, but I can imagine things like verification, adversarial examples, and algorithmic fairness lingering around on center stage for a while).
A lot of the focus on utility functions probably wouldn’t be there
And none of that is to say anything about those alternate timelines is better, but is to say that a lot of the things I often associate with AI safety are only contingently related. This is probably obvious to a lot of people on here, and of course we have seen some of the Yudkowskian foundational framings of the problem have been de-emphasized as non-LW people have joined the field.
On the other hand, as far as “lock-in” itself is concerned, it does seem like there’s a certain amount of deference that EA has given MIRI/LW on some of the more abstruse matters where would-be critics don’t want to sound stupid for lack of technical sophistication—UDT, Solomonoff, and similar stuff internal to agent foundations—and the longer any idea lingers around, and the farther it spreads, the harder it is to root out if we ever do find good reasons to overturn it. Although I’m not that worried about this, since those ideas are by definition only fully understood/debated by a small part of the community.
Also, it’s my impression that most EAs believe in one-boxing, but not necessarily UDT. For instance, some apparently prefer EDT-like theories, which makes me think the relatively simple arguments for one-boxing have percolated pretty widely (and are probably locked in), but the more advanced details are still largely up for debate. I think similar things can be said for a lot of other things, e.g. “thinking probabilistically” is locked in but maybe not a lot of the more complicated aspects of Bayesian epistemology that have come out of LW.
[Question] What are concrete examples of potential “lock-in” in AI research?
Yes, perhaps I should’ve been more clear. Learning certain distance functions is a practical solution to some things, so maybe the phrase “distance functions are hard” is too simplistic. What I meant to say is more like
Fully-specified distance functions are hard, over and above the difficulty of formally specifying most things, and it’s often hard to notice this difficulty
This is mostly applicable to Agent Foundations-like research, where we are trying to give a formal model of (some aspect of) how agents work. Sometimes, we can reduce our problem to defining the appropriate distance function, and it can feel like we’ve made some progress, but we haven’t actually gotten anywhere (the first two examples in the post are like this).
The 3rd example, where we are trying to formally verify an ML model against adversarial examples, is a bit different now that I think of it. Here we apparently need transparent, formally-specified distance function if we have any hope of absolutely proving the absence of adversarial examples. And in formal verification, the specification problem often is just philosophically hard like this. So I suppose this example is less insightful, except insofar as it lends extra intuitions for the other class of examples.
Distance Functions are Hard
- May 6, 2021, 4:51 PM; 1 point) 's comment on Taking Ideas Seriously by (
That all seems pretty fair.
That’s why I distinguished between the hypotheses of “human utility” and CEV. It is my vague understanding (and I could be wrong) that some alignment researchers see it as their task to align AGI with current humans and their values, thinking the “extrapolation” less important or that it will take care of itself, while others consider extrapolation an important part of the alignment problem. For the former group, human utility is more salient, while the latter probably cares more about the CEV hypothesis (and the arguments you list in favor of it).
My intuitions tend to agree, but I’m also inclined to ask “why not?” e.g. even if my preferences are absurdly cyclical, but we get AGI to imitate me perfectly (or me + faster thinking + more information), under what sense of the word is it “unaligned” with me? More generally, what is it about these other coherence conditions that prevent meaningful “alignment”? (Maybe it takes a big discursive can of worms, but I actually haven’t seen this discussed on a serious level so I’m quite happy to just read references).
Hadn’t thought about it this way. Partially updated (but still unsure what I think).