I don’t think that that’s the view of whoever wrote the paragraph you’re quoting, but at this point we’re doing exegesis
bideup
Hm, I think that paragraph is talking about the problem of getting an AI to care about a specific particular thing of your choosing (here diamond-maximising), not any arbitrary particular thing at all with no control over what it is. The MIRI-esque view thinks the former is hard and the latter happens inevitably.
Right, makes complete sense in the case of LLM-based agents, I guess I was just thinking about much more directly goal-trained agents.
I like the distinction but I don’t think either aimability or goalcraft will catch on as Serious People words. I’m less confident about aimability (doesn’t have a ring to it) but very confident about goalcraft (too Germanic, reminiscent of fantasy fiction).
Is words-which-won’t-be-co-opted what you’re going for (a la notkilleveryoneism), or should we brainstorm words-which-could-plausibly-catch on?
Perhaps, or perhaps not? I might be able to design a gun which shoots bullets in random directions (not on random walks), without being able to choose the direction.
Maybe we can back up a bit, and you could give some intuition for why you expect goals to go on random walks at all?
My default picture is that goals walk around during training and perhaps during a reflective process, and then stabilise somewhere.
I think that’s a reasonable point (but fairly orthogonal to the previous commenter’s one)
A gun which is not easily aimable doesn’t shoot bullets on random walks.
Or in less metaphorical language, the worry is that mostly that it’s hard to give the AI the specific goal you want to give it, not so much that it’s hard to make it have any goal at all. I think people generally expect that naively training an AGI without thinking about alignment will get you a goal-directed system, it just might not have the goal you want it to.
Sounds like the propensity interpretation of probability.
FiO?
Nice job
I like the idea of a public research journal a lot, interested to see how this pans out!
You seem to be operating on a model that says “either something is obvious to a person, or it’s useful to remind them of it, but not both”, whereas I personally find it useful to be reminded of things that I consider obvious, and I think many others do too. Perhaps you don’t, but could it be the case that you’re underestimating the extent to which it applies to you too?
I think one way to understand it is to disambiguate ‘obvious’ a bit and distinguish what someone knows from what’s salient to them.
If someone reminds me that sleep is important and I thank them for it, you could say “I’m surprised you didn’t know that already,” but of course I did know it already—it just hadn’t been salient enough to me to have as much impact on my decision-making as I’d like it to.
I think this post is basically saying: hey, here’s a thing that might not be as salient to you as it should be.
Maybe everything is always about the right amount of salient to you already! If so you are fortunate.
I think it falls into the category of ‘advice which is of course profoundly obvious but might not always occur to you’, in the same vein as ‘if you have a problem, you can try to solve it’.
When you’re looking for something you’ve lost, it’s genuinely helpful when somebody says ‘where did you last have it?’, and not just for people with some sort of looking-for-stuff-atypicality.
I think I practice something similar to this with selfishness: a load-bearing part of my epistemic rationality is having it feel acceptable that I sometimes (!) do things for selfish rather than altruistic reasons.
You can make yourself feel that selfish acts are unacceptable and hope this will make you very altruistic and not very selfish, but in practice it also makes you come up with delusional justifications as to why selfish acts are in fact altruistic.
From an impartial standpoint we can ask how much of the latter is woth it for how much of the former. I think one of life’s repeated lessons is that sacrificing your epistemics for instrumental reasons is almost always a bad idea.
Do people actually disapprove of and disagree with this comment, or do they disapprove of the use of said ‘poetic’ language in the post? If the latter, perhaps they should downvote the post and upvote the comment for honesty.
Perhaps there should be a react for “I disapprove of the information this comment revealed, but I’m glad it admitted it”.
Linkpost: Rishi Sunak’s Speech on AI (26th October)
LLMs calculate pdfs, regardless of whether they calculate ‘the true’ pdf.
Sometimes I think trying to keep up with the endless stream of new papers is like watching the news—you can save yourself time and become better informed by reading up on history (ie classic papers/textbooks) instead.
This is a comforting thought, so I’m a bit suspicious of it. But also it’s probably more true for a junior researcher not committed to a particular subfield than someone who’s already fully specialised.
Sometimes such feelings are your system 1 tracking real/important things that your system 2 hasn’t figured out yet.
Augmenting humans to do better alignment research seems like a pretty different proposal to building artificial alignment researchers.
The former is about making (presumed-aligned) humans more intelligent, which is a biology problem, while the latter is about making (presumed-intelligent) AIs aligned, which is a computer science problem.