Seth Herd

Karma: 3,087

I’ve been doing computational cognitive neuroscience research since getting my PhD in 2006, until the end of 2022. I’ve worked on computatonal theories of vision, executive function, episodic memory, and decision-making. I’ve focused on the emergent interactions that are needed to explain complex thought. I was increasingly concerned with AGI applications of the research, and reluctant to publish my best ideas. I’m incredibly excited to now be working directly on alignment, currently with generous funding from the Astera Institute. More info and publication list here.

[Question] What’s a better term now that “AGI” is too vague?

Seth Herd28 May 2024 18:02 UTC

15 points

8 comments2 min readLW link

Anthropic announces interpretability advances. How much does this advance alignment?

Seth Herd21 May 2024 22:30 UTC

49 points

4 comments3 min readLW link

(www.anthropic.com)

Instruction-following AGI is easier and more likely than value aligned AGI

Seth Herd15 May 2024 19:38 UTC

42 points

23 comments12 min readLW link

Goals selected from learned knowledge: an alternative to RL alignment

Seth Herd15 Jan 2024 21:52 UTC

40 points

17 comments7 min readLW link

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

2 Dec 2023 6:03 UTC

15 points

2 comments25 min readLW link

Corrigibility or DWIM is an attractive primary goal for AGI

Seth Herd25 Nov 2023 19:37 UTC

16 points

4 comments1 min readLW link

Sapience, understanding, and “AGI”

Seth Herd24 Nov 2023 15:13 UTC

15 points

3 comments6 min readLW link

Altman returns as OpenAI CEO with new board

Seth Herd22 Nov 2023 16:04 UTC

5 points

3 comments1 min readLW link

OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns

Seth Herd20 Nov 2023 14:20 UTC

52 points

28 comments1 min readLW link

(www.wired.com)

We have promising alignment plans with low taxes

Seth Herd10 Nov 2023 18:51 UTC

31 points

9 comments5 min readLW link

Seth Herd’s Shortform

Seth Herd10 Nov 2023 6:52 UTC

6 points

17 comments1 min readLW link

Shane Legg interview on alignment

Seth Herd28 Oct 2023 19:28 UTC

66 points

20 comments2 min readLW link

(www.youtube.com)

The (partial) fallacy of dumb superintelligence

Seth Herd18 Oct 2023 21:25 UTC

27 points

5 comments4 min readLW link

Steering subsystems: capabilities, agency, and alignment

Seth Herd29 Sep 2023 13:45 UTC

22 points

0 comments8 min readLW link

AGI isn’t just a technology

Seth Herd1 Sep 2023 14:35 UTC

18 points

12 comments2 min readLW link

Internal independent review for language model agent alignment

Seth Herd7 Jul 2023 6:54 UTC

53 points

26 comments11 min readLW link

Simpler explanations of AGI risk

Seth Herd14 May 2023 1:29 UTC

8 points

9 comments3 min readLW link

A simple presentation of AI risk arguments

Seth Herd26 Apr 2023 2:19 UTC

16 points

0 comments2 min readLW link

Capabilities and alignment of LLM cognitive architectures

Seth Herd18 Apr 2023 16:29 UTC

83 points

18 comments20 min readLW link

Agentized LLMs will change the alignment landscape

Seth Herd9 Apr 2023 2:29 UTC

153 points

95 comments3 min readLW link