bideup

Karma: 352

bideup Dec 27, 2023, 11:27 PM
10 points
7
in reply to: Noosphere89’s comment on: Critical review of Christiano’s disagreements with Yudkowsky
Augmenting humans to do better alignment research seems like a pretty different proposal to building artificial alignment researchers.

The former is about making (presumed-aligned) humans more intelligent, which is a biology problem, while the latter is about making (presumed-intelligent) AIs aligned, which is a computer science problem.

bideup Dec 15, 2023, 10:17 PM
3 points
0
in reply to: faul_sname’s comment on: “AI Alignment” is a Dangerously Overloaded Term
I don’t think that that’s the view of whoever wrote the paragraph you’re quoting, but at this point we’re doing exegesis

bideup Dec 15, 2023, 7:01 PM
5 points
2
in reply to: faul_sname’s comment on: “AI Alignment” is a Dangerously Overloaded Term
Hm, I think that paragraph is talking about the problem of getting an AI to care about a specific particular thing of your choosing (here diamond-maximising), not any arbitrary particular thing at all with no control over what it is. The MIRI-esque view thinks the former is hard and the latter happens inevitably.

bideup Dec 15, 2023, 4:32 PM
3 points
0
in reply to: avturchin’s comment on: “AI Alignment” is a Dangerously Overloaded Term
Right, makes complete sense in the case of LLM-based agents, I guess I was just thinking about much more directly goal-trained agents.

bideup Dec 15, 2023, 4:27 PM
5 points
0
on: “AI Alignment” is a Dangerously Overloaded Term
I like the distinction but I don’t think either aimability or goalcraft will catch on as Serious People words. I’m less confident about aimability (doesn’t have a ring to it) but very confident about goalcraft (too Germanic, reminiscent of fantasy fiction).

Is words-which-won’t-be-co-opted what you’re going for (a la notkilleveryoneism), or should we brainstorm words-which-could-plausibly-catch on?

bideup Dec 15, 2023, 4:12 PM
1 point
0
in reply to: avturchin’s comment on: “AI Alignment” is a Dangerously Overloaded Term
Perhaps, or perhaps not? I might be able to design a gun which shoots bullets in random directions (not on random walks), without being able to choose the direction.

Maybe we can back up a bit, and you could give some intuition for why you expect goals to go on random walks at all?

My default picture is that goals walk around during training and perhaps during a reflective process, and then stabilise somewhere.

bideup Dec 15, 2023, 4:10 PM
3 points
0
in reply to: Roko’s comment on: “AI Alignment” is a Dangerously Overloaded Term
I think that’s a reasonable point (but fairly orthogonal to the previous commenter’s one)

bideup Dec 15, 2023, 3:51 PM
5 points
2
in reply to: avturchin’s comment on: “AI Alignment” is a Dangerously Overloaded Term
A gun which is not easily aimable doesn’t shoot bullets on random walks.

Or in less metaphorical language, the worry is that mostly that it’s hard to give the AI the specific goal you want to give it, not so much that it’s hard to make it have any goal at all. I think people generally expect that naively training an AGI without thinking about alignment will get you a goal-directed system, it just might not have the goal you want it to.

bideup Dec 11, 2023, 1:28 PM
2 points
1
in reply to: carboniferous_umbraculum ’s comment on: Understanding Subjective Probabilities
Sounds like the propensity interpretation of probability.

bideup Dec 7, 2023, 10:49 AM
1 point
0
in reply to: Noosphere89’s comment on: How do you feel about LessWrong these days? [Open feedback thread]
FiO?

bideup Nov 27, 2023, 12:57 PM
3 points
4
on: Shallow review of live agendas in alignment & safety
Nice job

bideup Nov 27, 2023, 11:31 AM
6 points
2
on: Thomas Kwa’s research journal
I like the idea of a public research journal a lot, interested to see how this pans out!

bideup Nov 14, 2023, 8:57 PM
15 points
10
in reply to: AlphaAndOmega’s comment on: You can just spontaneously call people you haven’t met in years
You seem to be operating on a model that says “either something is obvious to a person, or it’s useful to remind them of it, but not both”, whereas I personally find it useful to be reminded of things that I consider obvious, and I think many others do too. Perhaps you don’t, but could it be the case that you’re underestimating the extent to which it applies to you too?

I think one way to understand it is to disambiguate ‘obvious’ a bit and distinguish what someone knows from what’s salient to them.

If someone reminds me that sleep is important and I thank them for it, you could say “I’m surprised you didn’t know that already,” but of course I did know it already—it just hadn’t been salient enough to me to have as much impact on my decision-making as I’d like it to.

I think this post is basically saying: hey, here’s a thing that might not be as salient to you as it should be.

Maybe everything is always about the right amount of salient to you already! If so you are fortunate.

bideup Nov 14, 2023, 4:30 PM
8 points
2
in reply to: AlphaAndOmega’s comment on: You can just spontaneously call people you haven’t met in years
I think it falls into the category of ‘advice which is of course profoundly obvious but might not always occur to you’, in the same vein as ‘if you have a problem, you can try to solve it’.

When you’re looking for something you’ve lost, it’s genuinely helpful when somebody says ‘where did you last have it?’, and not just for people with some sort of looking-for-stuff-atypicality.

bideup Nov 14, 2023, 4:26 PM
1 point
2
on: Making Bad Decisions On Purpose
I think I practice something similar to this with selfishness: a load-bearing part of my epistemic rationality is having it feel acceptable that I sometimes (!) do things for selfish rather than altruistic reasons.

You can make yourself feel that selfish acts are unacceptable and hope this will make you very altruistic and not very selfish, but in practice it also makes you come up with delusional justifications as to why selfish acts are in fact altruistic.

From an impartial standpoint we can ask how much of the latter is woth it for how much of the former. I think one of life’s repeated lessons is that sacrificing your epistemics for instrumental reasons is almost always a bad idea.

bideup Nov 13, 2023, 9:14 AM
1 point
0
in reply to: George3d6’s comment on: Don’t Donate A Kidney To A Stranger
Do people actually disapprove of and disagree with this comment, or do they disapprove of the use of said ‘poetic’ language in the post? If the latter, perhaps they should downvote the post and upvote the comment for honesty.

Perhaps there should be a react for “I disapprove of the information this comment revealed, but I’m glad it admitted it”.

Linkpost: Rishi Sunak’s Speech on AI (26th October)

bideupOct 27, 2023, 11:57 AM

85 points

8 comments7 min readLW link

(www.gov.uk)

bideup Oct 23, 2023, 3:24 PM
7 points
5
in reply to: quetzal_rainbow’s comment on: Alignment Implications of LLM Successes: a Debate in One Act
LLMs calculate pdfs, regardless of whether they calculate ‘the true’ pdf.

bideup Oct 22, 2023, 11:26 AM
1 point
on: On (Not) Reading Papers
Sometimes I think trying to keep up with the endless stream of new papers is like watching the news—you can save yourself time and become better informed by reading up on history (ie classic papers/textbooks) instead.

This is a comforting thought, so I’m a bit suspicious of it. But also it’s probably more true for a junior researcher not committed to a particular subfield than someone who’s already fully specialised.

bideup Oct 21, 2023, 10:35 AM
9 points
9
in reply to: Jay Bailey’s comment on: Holly Elmore and Rob Miles dialogue on AI Safety Advocacy
Sometimes such feelings are your system 1 tracking real/important things that your system 2 hasn’t figured out yet.

bideup

Linkpost: Rishi Su­nak’s Speech on AI (26th Oc­to­ber)

Linkpost: Rishi Sunak’s Speech on AI (26th October)