I come to you with a dollar I want to spend on AI. You can allocate p pennies to go to capabilities and 100-p pennies to go to alignment, but only if you know of a project that realizes that allocation. For example, we might think that GAN research sets p = 98 (providing 2 cents to alignment) while interpretability research sets p = 10 (providing 90 cents to alignment).
Is this remotely useful? This is a really rough model (you might think it’s more of a venn diagram and that this model doesn’t provide a way of reasoning about the double counting problem).
a task: rate research areas, even whole agendas, with such value p. Many people may disagree about my example assignments to GANs and interpretability, or think both of those are too broad.
What are some alternatives to the splitting a dollar intuition?
To say something is capabilities-prone is less to say a dollar has been cleanly split, and more to say that there are some dynamics that sort of tend toward or get pushed toward different directions. Perhaps I want some sort of fluid metaphor instead.
capabilities-prone research.
I come to you with a dollar I want to spend on AI. You can allocate
p
pennies to go to capabilities and100-p
pennies to go to alignment, but only if you know of a project that realizes that allocation. For example, we might think that GAN research setsp = 98
(providing 2 cents to alignment) while interpretability research setsp = 10
(providing 90 cents to alignment).Is this remotely useful? This is a really rough model (you might think it’s more of a venn diagram and that this model doesn’t provide a way of reasoning about the double counting problem).
a task: rate research areas, even whole agendas, with such value
p
. Many people may disagree about my example assignments to GANs and interpretability, or think both of those are too broad.What are some alternatives to the splitting a dollar intuition?
To say something is capabilities-prone is less to say a dollar has been cleanly split, and more to say that there are some dynamics that sort of tend toward or get pushed toward different directions. Perhaps I want some sort of fluid metaphor instead.